Computer Vision › Core AI ›

Multimodal Learning

1257 directly classified papers

Papers per year

Papers

LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers ACL 2025

Chat-Driven Text Generation and Interaction for Person Retrieval EMNLP 2025

Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks EMNLP 2025

M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data ACL 2025

FREE: Fast and Robust Vision Language Models with Early Exits ACL 2025

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection EMNLP 2025

Testing Spatial Intuitions of Humans and Large Language and Multimodal Models in Analogies ACL 2025

Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning ACL 2025

Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models ACL 2025

Making LVLMs Look Twice: Contrastive Decoding with Contrast Images ACL 2025

UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation ACL 2025

Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains ACL 2025

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models ACL 2025

PRIM: Towards Practical In-Image Multilingual Machine Translation EMNLP 2025

SHARP: Steering Hallucination in LVLMs via Representation Engineering EMNLP 2025

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification EMNLP 2025

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs EMNLP 2025

MNLP@DravidianLangTech 2025: A Deep Multimodal Neural Network for Hate Speech Detection in Dravidian Languages NAACL 2025

DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement EMNLP 2025

LVLMs are Bad at Overhearing Human Referential Communication EMNLP 2025

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning EMNLP 2025

What are Foundation Models Cooking in the Post-Soviet World? EMNLP 2025

LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval EMNLP 2025

Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity EMNLP 2025

Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation ACL 2025