Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
CVPR 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
CVPR 2025
Adaptive Parameter Selection for Tuning Vision-Language Models
CVPR 2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
CVPR 2025
One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency
CVPR 2025
ReWind: Understanding Long Videos with Instructed Learnable Memory
CVPR 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
CVPR 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
CVPR 2025
Can Large Language Models Personalize Dialogues to Generational Styles?
EMNLP 2025
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning
CVPR 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
CVPR 2025
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
CVPR 2025
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
CVPR 2025
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
CVPR 2025
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
CVPR 2025
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
CVPR 2025
LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
EMNLP 2025
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
CVPR 2025
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
CVPR 2025
Free Lunch Enhancements for Multi-modal Crowd Counting
CVPR 2025
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
CVPR 2025
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
CVPR 2025
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
ICCV 2025
<
1
2
3
4
5
…
128
>