Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
AAAI 2025
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
POS-Aware Neural Approaches for Word Alignment in Dravidian Languages
COLING 2025
Language Driven Occupancy Prediction
ICCV 2025
Chimera: Improving Generalist Model with Domain-Specific Experts
ICCV 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
ICCV 2025
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
ICCV 2025
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
ICCV 2025
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
ICCV 2025
BlinkTrack: Feature Tracking over 80 FPS via Events and Images
ICCV 2025
MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
ICCV 2025
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
ICCV 2025
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
ICCV 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
ICCV 2025
PanSt3R: Multi-view Consistent Panoptic Segmentation
ICCV 2025
G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection
ICCV 2025
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
ICCV 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
ICCV 2025
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
ICCV 2025
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing
ICCV 2025
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ICCV 2025
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
ICCV 2025
Audio-centric Video Understanding Benchmark without Text Shortcut
EMNLP 2025
<
1
…
10
11
12
…
128
>