← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning CVPR 2025

Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement ICCV 2025

GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025

Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process ICCV 2025

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models CVPR 2025

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge ICCV 2025

PerLA: Perceptive 3D Language Assistant CVPR 2025

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification CVPR 2025

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining CVPR 2025

Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition CVPR 2025

Learning Textual Prompts for Open-World Semi-Supervised Learning CVPR 2025

CoMMIT: Coordinated Multimodal Instruction Tuning EMNLP 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression CVPR 2025

Scaling Language-Free Visual Representation Learning ICCV 2025

Generating Multimodal Driving Scenes via Next-Scene Prediction CVPR 2025

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation CVPR 2025

MoEdit: On Learning Quantity Perception for Multi-object Image Editing CVPR 2025

Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification CVPR 2025

Localizing Events in Videos with Multimodal Queries CVPR 2025

MAD: Memory-Augmented Detection of 3D Objects CVPR 2025

GENIUS: A Generative Framework for Universal Multimodal Search CVPR 2025

Customized Condition Controllable Generation for Video Soundtrack CVPR 2025

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation CVPR 2025

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking ICCV 2025