Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Let’s Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification
EACL 2026
DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment
WACV 2026
Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource Indic Language Varieties
EACL 2026
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Frechet Distance
WACV 2026
Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
WACV 2026
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
WACV 2026
ORCA: Object Recognition and Comprehension for Archiving Marine Species
WACV 2026
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
WACV 2026
Divide and Refine: Enhancing Multimodal Representation and Explainability for Emotion Recognition in Conversation
WACV 2026
See, Record, Do: Automated Generation of UI Workflows from Tutorial Videos
WACV 2026
Competence Collapse in Code-Mixed Generation: Spectral Evidence and Mechanistic Recovery via Cross-Lingual Activation Steering
EACL 2026
A Fine-Grained Linguistic Evaluation of Low-Resource Luxembourgish–English MT
EACL 2026
PolyFrame at MWE-2026 AdMIRe 2: When Words Are Not Enough: Multimodal Idiom Disambiguation
EACL 2026
alexandru412 at MWE-2026 AdMIRe 2.0: Advancing Multimodal Idiomaticity Representation
EACL 2026
LST at MWE-2026 AdMIRe 2: Advancing Multimodal Idiomaticity Representation
EACL 2026
ITUNLP at MWE-2026 AdMIRe 2: A Zero-Shot LLM Pipeline for Multimodal Idiom Understanding and Ranking
EACL 2026
ITUNLP2 at MWE-2026 AdMIRe 2: Modular Zero-Shot Pipelines for Multimodal Idiom Grounding and Ranking
EACL 2026
Stochastic Parrots or True Virtuosos? Digging Deeper Into the Audio-Video Understanding of AVQA Models
EACL 2026
Multimodal Claim Extraction for Fact-Checking
EACL 2026
Test-Time Consistency in Vision Language Models
WACV 2026
Harnessing Object Grounding for Time-Sensitive Video Understanding
WACV 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
WACV 2026
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
EACL 2026
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
EACL 2026
Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes
EACL 2026
<
1
2
3
4
5
…
523
>