Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
ACL 2025
FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning
ACL 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
ACL 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
ACL 2025
VISA: Retrieval Augmented Generation with Visual Source Attribution
ACL 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
ACL 2025
IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video Detection
ACL 2025
A Character-Centric Creative Story Generation via Imagination
ACL 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
ACL 2025
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
ACL 2025
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
ACL 2025
R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding
ACL 2025
A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds
ACL 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
ACL 2025
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
ACL 2025
Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?
ACL 2025
MMInA: Benchmarking Multihop Multimodal Internet Agents
ACL 2025
VADE: Visual Attention Guided Hallucination Detection and Elimination
ACL 2025
MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching
ACL 2025
See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models
ACL 2025
MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering
ACL 2025
Sign2Vis: Automated Data Visualization from Sign Language
ACL 2025
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
ACL 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
ACL 2025
Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis
EMNLP 2025
<
1
…
4
5
6
…
51
>