Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
RGB-D Video Mirror Detection
WACV 2025
ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion
WACV 2025
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
CVPR 2025
Active Data Curation Effectively Distills Large-Scale Multimodal Models
CVPR 2025
Faithful Inference Chains Extraction for Fact Verification over Multi-view Heterogeneous Graph with Causal Intervention
COLING 2025
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
CVPR 2025
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
CVPR 2025
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
CVPR 2025
Using Multimodal Models for Informative Classification of Ambiguous Tweets in Crisis Response
NAACL 2025
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
CVPR 2025
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025
FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones
CVPR 2025
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction
CVPR 2025
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
CVPR 2025
Social Hatred: Efficient Multimodal Detection of Hatemongers
ACL 2025
A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation
IJCNLP 2025
ExMute: A Context-Enriched Multimodal Dataset for Hateful Memes
COLING 2025
Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models
COLING 2025
Cross-Lingual Document Recommendations with Transformer-Based Representations: Evaluating Multilingual Models and Mapping Techniques
COLING 2025
Enhancing Dialectal Arabic Intent Detection through Cross-Dialect Multilingual Input Augmentation
COLING 2025
Can LLMs Convert Graphs to Text-Attributed Graphs?
NAACL 2025
An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues
NAACL 2025
When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models
NAACL 2025
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
NAACL 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
<
1
…
6
7
8
…
49
>