← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER EMNLP 2025

IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory ACL 2025

Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment ACL 2025

Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data ACL 2025

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning ACL 2025

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models ACL 2025

HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims ACL 2025

QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines ACL 2025

MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction ACL 2025

Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction ACL 2025

Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization ACL 2025

Towards Reliable Large Audio Language Model ACL 2025

Social Hatred: Efficient Multimodal Detection of Hatemongers ACL 2025

Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders EMNLP 2025

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment CVPR 2025

Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation ACL 2025

Vision-Language Models Struggle to Align Entities across Modalities ACL 2025

SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis AAAI 2025

LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval EMNLP 2025

Flexible Frame Selection for Efficient Video Reasoning CVPR 2025

Sign2Vis: Automated Data Visualization from Sign Language ACL 2025

Predicting Depression in Screening Interviews from Interactive Multi-Theme Collaboration ACL 2025

M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data ACL 2025

Overview of MM-ArgFallacy2025 on Multimodal Argumentative Fallacy Detection and Classification in Political Debates ACL 2025

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025