Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
WACV 2026
Game Ground Bench: Probing the Limits of LVLMs in Complex Semantic Grounding Across Game Universes
AAAI 2026
SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
EACL 2026
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
EACL 2026
Diagnosing Vision Language Models’ Perception by Leveraging Human Methods for Color Vision Deficiencies
EACL 2026
Mind Your Special Tokens! On the Importance of Dedicated Sequence-End Tokens in Vision-Language Embedding Models
EACL 2026
A Browser-based Open Source Assistant for Multimodal Content Verification
EACL 2026
InkSight: Towards AI-Aided Historical Manuscript Analysis
EACL 2026
Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework
EACL 2026
Automatic Funny Scene Extraction from Long-form Cinematic Videos
AAAI 2026
Automated Unified Reasoning with Vision-Language Models for Multi-modal Burn Assessment
AAAI 2026
GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs
AAAI 2026
Speaker Anonymization for Children's Oral Reading Assessment
AAAI 2026
Explain-from-Stroke: Capturing Invisible Learning Processes Through Handwriting Dynamics Analysis
AAAI 2026
Multimodal Tabular Data Learning
AAAI 2026
Wearable Intelligence for Healthcare Robotics: From Brain Activity to Body Movements
AAAI 2026
Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract)
AAAI 2026
How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)
AAAI 2026
Can Large Language Models Grasp 3D Medical Anatomy Shapes? (Student Abstract)
AAAI 2026
VLHSA: Vision-Language Hierarchical Semantic Alignment for Jigsaw Puzzle Solving with Eroded Gaps (Student Abstract)
AAAI 2026
An Approach Towards Developing Relationally Intelligent Multimodal Framework for Stock Movement Prediction (Student Abstract)
AAAI 2026
Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract)
AAAI 2026
Guarding Digital Identity: Attention-Guided Fusion for Detecting Forged ID Documents (Student Abstract)
AAAI 2026
BSAN: Behavioral State Attention Network for Modeling Mosquito Host-Seeking Behavior
AAAI 2026
EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services
AAAI 2026
<
1
…
10
11
12
…
523
>