Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
CVPR 2025
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
CVPR 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
CVPR 2025
Learning to Highlight Audio by Watching Movies
CVPR 2025
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
CVPR 2025
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
CVPR 2025
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising
CVPR 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
CVPR 2025
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
CVPR 2025
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
CVPR 2025
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
CVPR 2025
One_by_zero@DravidianLangTech 2025: A Multimodal Approach for Misogyny Meme Detection in Malayalam Leveraging Visual and Textual Features
NAACL 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
CVPR 2025
Video-Guided Foley Sound Generation with Multimodal Controls
CVPR 2025
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
CVPR 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
CVPR 2025
Video Language Model Pretraining with Spatio-temporal Masking
CVPR 2025
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
CVPR 2025
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
CVPR 2025
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
CVPR 2025
M-LLM Based Video Frame Selection for Efficient Video Understanding
CVPR 2025
EgoLM: Multi-Modal Language Model of Egocentric Motions
CVPR 2025
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction
CVPR 2025
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
CVPR 2025
<
1
…
8
9
10
…
128
>