Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER
EMNLP 2025
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
ACL 2025
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment
ACL 2025
Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
ACL 2025
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
ACL 2025
QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines
ACL 2025
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction
ACL 2025
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
ACL 2025
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
ACL 2025
Towards Reliable Large Audio Language Model
ACL 2025
Social Hatred: Efficient Multimodal Detection of Hatemongers
ACL 2025
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
EMNLP 2025
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
CVPR 2025
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
ACL 2025
Vision-Language Models Struggle to Align Entities across Modalities
ACL 2025
SIDE: Socially Informed Drought Estimation Toward Understanding Societal Impact Dynamics of Environmental Crisis
AAAI 2025
LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
EMNLP 2025
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
Sign2Vis: Automated Data Visualization from Sign Language
ACL 2025
Predicting Depression in Screening Interviews from Interactive Multi-Theme Collaboration
ACL 2025
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
ACL 2025
Overview of MM-ArgFallacy2025 on Multimodal Argumentative Fallacy Detection and Classification in Political Debates
ACL 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
ACL 2025
<
1
2
3
4
5
…
49
>