Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
EMNLP 2025
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
EMNLP 2025
X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA
EMNLP 2025
Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study
EMNLP 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
EMNLP 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
EMNLP 2025
Probing Logical Reasoning of MLLMs in Scientific Diagrams
EMNLP 2025
Can Vision-Language Models Solve Visual Math Equations?
EMNLP 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis
EMNLP 2025
D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
EMNLP 2025
PRIM: Towards Practical In-Image Multilingual Machine Translation
EMNLP 2025
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
EMNLP 2025
TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority Languages
EMNLP 2025
Transparent and Coherent Procedural Mistake Detection
EMNLP 2025
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
EMNLP 2025
SHARP: Steering Hallucination in LVLMs via Representation Engineering
EMNLP 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
EMNLP 2025
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
EMNLP 2025
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
EMNLP 2025
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
EMNLP 2025
Task-Aware Resolution Optimization for Visual Large Language Models
EMNLP 2025
Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
EMNLP 2025
Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks
EMNLP 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
ACL 2025
<
1
…
7
8
9
…
51
>