Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection EACL 2026

Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone EACL 2026

Do GUI Grounders Truly Understand UI Elements? EACL 2026

Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation EACL 2026

Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention EACL 2026

HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning EACL 2026

DRIVINGVQA: A Dataset for Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios EACL 2026

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards EACL 2026

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding EACL 2026

Reasoning Beyond Literal: Cross-style Multimodal Reasoning for Figurative Language Understanding EACL 2026

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance EACL 2026

LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models EACL 2026

KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR EACL 2026

Discourses of Prevention: A Multimodal Study of HPV Vaccination Campaigns in Italy EACL 2026

MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment EACL 2026

Simplifying Outcomes of Language Model Component Analyses with ELIA EACL 2026

GraphRAG-Rad: Concept-Aware Radiology Report Generation via Latent Visual-Semantic Retrieval EACL 2026

Scale Is All You Need: Analyzing Modality Interaction and Speaker Intent Without Fine-Tuning EACL 2026

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision–Language Models EACL 2026

Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS EACL 2026

MobileCity: An Efficient Framework for Large-Scale Urban Behavior Simulation EACL 2026

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets EACL 2026

PatentVision: A multimodal method for drafting patent applications EACL 2026

Encoding and Decoding Language in the Brain with Language Models EACL 2026

JEEM: Vision-Language Understanding in Four Arabic Dialects EACL 2026