Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Let’s Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification EACL 2026

DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment WACV 2026

Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource Indic Language Varieties EACL 2026

Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Frechet Distance WACV 2026

Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation WACV 2026

Saliency-Guided DETR for Moment Retrieval and Highlight Detection WACV 2026

ORCA: Object Recognition and Comprehension for Archiving Marine Species WACV 2026

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning WACV 2026

Divide and Refine: Enhancing Multimodal Representation and Explainability for Emotion Recognition in Conversation WACV 2026

See, Record, Do: Automated Generation of UI Workflows from Tutorial Videos WACV 2026

Competence Collapse in Code-Mixed Generation: Spectral Evidence and Mechanistic Recovery via Cross-Lingual Activation Steering EACL 2026

A Fine-Grained Linguistic Evaluation of Low-Resource Luxembourgish–English MT EACL 2026

PolyFrame at MWE-2026 AdMIRe 2: When Words Are Not Enough: Multimodal Idiom Disambiguation EACL 2026

alexandru412 at MWE-2026 AdMIRe 2.0: Advancing Multimodal Idiomaticity Representation EACL 2026

LST at MWE-2026 AdMIRe 2: Advancing Multimodal Idiomaticity Representation EACL 2026

ITUNLP at MWE-2026 AdMIRe 2: A Zero-Shot LLM Pipeline for Multimodal Idiom Understanding and Ranking EACL 2026

ITUNLP2 at MWE-2026 AdMIRe 2: Modular Zero-Shot Pipelines for Multimodal Idiom Grounding and Ranking EACL 2026

Stochastic Parrots or True Virtuosos? Digging Deeper Into the Audio-Video Understanding of AVQA Models EACL 2026

Multimodal Claim Extraction for Fact-Checking EACL 2026

Test-Time Consistency in Vision Language Models WACV 2026

Harnessing Object Grounding for Time-Sensitive Video Understanding WACV 2026

Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction WACV 2026

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion EACL 2026

The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning EACL 2026

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes EACL 2026