Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
ACL 2025
Chat-Driven Text Generation and Interaction for Person Retrieval
EMNLP 2025
Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks
EMNLP 2025
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
ACL 2025
FREE: Fast and Robust Vision Language Models with Early Exits
ACL 2025
TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection
EMNLP 2025
Testing Spatial Intuitions of Humans and Large Language and Multimodal Models in Analogies
ACL 2025
Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning
ACL 2025
Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models
ACL 2025
Making LVLMs Look Twice: Contrastive Decoding with Contrast Images
ACL 2025
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
ACL 2025
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains
ACL 2025
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models
ACL 2025
PRIM: Towards Practical In-Image Multilingual Machine Translation
EMNLP 2025
SHARP: Steering Hallucination in LVLMs via Representation Engineering
EMNLP 2025
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
EMNLP 2025
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
EMNLP 2025
MNLP@DravidianLangTech 2025: A Deep Multimodal Neural Network for Hate Speech Detection in Dravidian Languages
NAACL 2025
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
EMNLP 2025
LVLMs are Bad at Overhearing Human Referential Communication
EMNLP 2025
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
EMNLP 2025
What are Foundation Models Cooking in the Post-Soviet World?
EMNLP 2025
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
EMNLP 2025
Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
EMNLP 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
ACL 2025
<
1
…
5
6
7
…
51
>