Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
EMNLP 2025
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
EMNLP 2025
PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes
EMNLP 2025
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
EMNLP 2025
What are Foundation Models Cooking in the Post-Soviet World?
EMNLP 2025
Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
EMNLP 2025
Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint
EMNLP 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
EMNLP 2025
Multimodal Neural Machine Translation: A Survey of the State of the Art
EMNLP 2025
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
EMNLP 2025
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
EMNLP 2025
Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning
EMNLP 2025
RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
EMNLP 2025
Memory-QA: Answering Recall Questions Based on Multimodal Memories
EMNLP 2025
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
EMNLP 2025
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
EMNLP 2025
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
EMNLP 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
EMNLP 2025
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
EMNLP 2025
VLA-Mark: A cross modal watermark for large vision-language alignment models
EMNLP 2025
iVISPAR — An Interactive Visual-Spatial Reasoning Benchmark for VLMs
EMNLP 2025
SimpleDoc: Multi‐Modal Document Understanding with Dual‐Cue Page Retrieval and Iterative Refinement
EMNLP 2025
MAviS: A Multimodal Conversational Assistant For Avian Species
EMNLP 2025
Multilingual Pretraining for Pixel Language Models
EMNLP 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
ACL 2025
<
1
…
8
9
10
…
51
>