← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data WACV 2026

ScoliGaitX: A Deep Multi-Modal Fusion Network for Scoliosis Assessment via Gait Video Analysis WACV 2026

MixER: From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification WACV 2026

Large Sign Language Models: Toward 3D American Sign Language Translation WACV 2026

VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework WACV 2026

Multi-Modal Soccer Scene Analysis with Masked Pre-Training WACV 2026

PaRaChute: Pathology-Radiology Cross-Modal Fusion for Missing-Modality-Robust Survival Prediction WACV 2026

Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources WACV 2026

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging WACV 2026

Conditional Text-to-Image Generation with Reference Guidance WACV 2026

MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities WACV 2026

Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care WACV 2026

RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding WACV 2026

CLIP-IT: CLIP-based Pairing of Histology Images with Privileged Textual Information WACV 2026

Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations WACV 2026

MuseDance: A Diffusion-based Music-Driven Image Animation System WACV 2026

AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization WACV 2026

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale CVPR 2025

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes AAAI 2025

Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks COLING 2025

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion ICCV 2025

Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions ICCV 2025

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models ICCV 2025

Harnessing Input-Adaptive Inference for Efficient VLN ICCV 2025

ProbMED: A Probabilistic Framework for Medical Multimodal Binding ICCV 2025