Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
CVPR 2025
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
CVPR 2025
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
ICCV 2025
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
CVPR 2025
MINIMA: Modality Invariant Image Matching
CVPR 2025
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
ICCV 2025
Audio-Visual Instance Segmentation
CVPR 2025
Animate and Sound an Image
CVPR 2025
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
CVPR 2025
VideoAuteur: Towards Long Narrative Video Generation
ICCV 2025
Semantic and Sequential Alignment for Referring Video Object Segmentation
CVPR 2025
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
CVPR 2025
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
CVPR 2025
Unified Open-World Segmentation with Multi-Modal Prompts
ICCV 2025
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
CVPR 2025
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs
CVPR 2025
SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks
CVPR 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
ICCV 2025
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
CVPR 2025
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
ICCV 2025
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
ICCV 2025
<
1
…
4
5
6
…
128
>