Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data
WACV 2026
ScoliGaitX: A Deep Multi-Modal Fusion Network for Scoliosis Assessment via Gait Video Analysis
WACV 2026
MixER: From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification
WACV 2026
Large Sign Language Models: Toward 3D American Sign Language Translation
WACV 2026
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
WACV 2026
Multi-Modal Soccer Scene Analysis with Masked Pre-Training
WACV 2026
PaRaChute: Pathology-Radiology Cross-Modal Fusion for Missing-Modality-Robust Survival Prediction
WACV 2026
Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources
WACV 2026
Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
WACV 2026
Conditional Text-to-Image Generation with Reference Guidance
WACV 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care
WACV 2026
RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding
WACV 2026
CLIP-IT: CLIP-based Pairing of Histology Images with Privileged Textual Information
WACV 2026
Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations
WACV 2026
MuseDance: A Diffusion-based Music-Driven Image Animation System
WACV 2026
AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization
WACV 2026
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
CVPR 2025
DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes
AAAI 2025
Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks
COLING 2025
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
ICCV 2025
Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
ICCV 2025
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
Harnessing Input-Adaptive Inference for Efficient VLN
ICCV 2025
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
ICCV 2025
<
1
2
3
4
5
…
128
>