Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CVPR 2025
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
ICCV 2025
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
CVPR 2025
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
ICCV 2025
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
CVPR 2025
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge
ICCV 2025
PerLA: Perceptive 3D Language Assistant
CVPR 2025
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
CVPR 2025
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
CVPR 2025
Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition
CVPR 2025
Learning Textual Prompts for Open-World Semi-Supervised Learning
CVPR 2025
CoMMIT: Coordinated Multimodal Instruction Tuning
EMNLP 2025
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
Generating Multimodal Driving Scenes via Next-Scene Prediction
CVPR 2025
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
CVPR 2025
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
CVPR 2025
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
CVPR 2025
Localizing Events in Videos with Multimodal Queries
CVPR 2025
MAD: Memory-Augmented Detection of 3D Objects
CVPR 2025
GENIUS: A Generative Framework for Universal Multimodal Search
CVPR 2025
Customized Condition Controllable Generation for Video Soundtrack
CVPR 2025
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
CVPR 2025
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
ICCV 2025
<
1
2
3
4
5
…
128
>