Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data
EACL 2026
Enhancing Vision Language Corruption Robustness using Cross-Distribution & Prompted Denoisers
WACV 2026
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
WACV 2026
Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation
WACV 2026
Crafting Descriptive Information for a Zero-shot Method to Improve Knowledge-Based Visual Question Answering Performance
WACV 2026
Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources
WACV 2026
From Prompt to Production: Automating Brand-Safe Marketing Imagery with Text-to-Image Models
WACV 2026
Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions
WACV 2026
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding
WACV 2026
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
WACV 2026
What’s Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
EACL 2026
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
WACV 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
Towards Unconstrained Cross-View Pose Estimation
WACV 2026
Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation
WACV 2026
SVD-Det: A Lightweight Framework for Video Forgery Detection Using Semantic and Visual Defect Cues
WACV 2026
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
WACV 2026
Beyond Real Weights: Hypercomplex Representations for Stable Quantization
WACV 2026
A Framework for Real-Time Surgical Phase Recognition with Application to Robot-Assisted Partial Nephrectomy
WACV 2026
Vietnamese Automatic Speech Recognition: A Revisit
EACL 2026
Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara
EACL 2026
Evaluating Yoruba Text-to-Speech Systems for Accessible Computer-Based Testing in Visually Impaired Learners
EACL 2026
VILLAIN at AVerImaTeC: Verifying Image–Text Claims via Multi-Agent Collaboration
EACL 2026
RegionAligner: Bridging Ego-Exo Views for Object Correspondence via Unified Text-Visual Learning
WACV 2026
Exploring Cross-Lingual Voice Conversion Methods for Anonymizing Low-Resource Text-to-Speech
EACL 2026
<
1
…
8
9
10
…
523
>