Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Semantic Map Guided Bird's-Eye View Learning for Online HD Map Construction WACV 2026

DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization AAAI 2026

TreeBridge: Aligning LLM Embeddings in Industrial Recommender Systems AAAI 2026

See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval EACL 2026

Multimodal Graph Representation Learning over Arbitrary Sets of Modalities WACV 2026

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking EACL 2026

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models EACL 2026

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs EACL 2026

Can MLLMs Find Their Way in a City? Exploring Emergent Navigation from Web-Scale Knowledge EACL 2026

VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought EACL 2026

A Unified View on Emotion Representation in Large Language Models EACL 2026

Chat-Ghosting: Methods for Auto-Completion in Dialog Systems EACL 2026

Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance EACL 2026

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images EACL 2026

DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding EACL 2026

Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly EACL 2026

On the Additive Compositionality of Task Vectors in Vision–Language Models EACL 2026

FiMMIA: scaling semantic perturbation-based membership inference across modalities EACL 2026

Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding EACL 2026

Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models EACL 2026

Compact Multimodal Language Models as Robust OCR Alternatives for Noisy Textual Clinical Reports EACL 2026

Adapting Vision-Language Models for E-commerce Understanding at Scale EACL 2026

TechING: Towards Real World Technical Image Understanding via VLMs EACL 2026

Unlocking Large Audio-Language Models for Interactive Language Learning EACL 2026

Benchmarking Direct Preference Optimization for Medical Large Vision–Language Models EACL 2026