← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation ICCV 2025

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation ICCV 2025

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search ICCV 2025

DALIP: Distribution Alignment-based Language-Image Pre-Training for Domain-Specific Data ICCV 2025

AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance ICCV 2025

Beyond RGB: Adaptive Parallel Processing for RAW Object Detection ICCV 2025

RGB-D Video Mirror Detection WACV 2025

TrenTeam at Multilingual Counterspeech Generation: Multilingual Passage Re-Ranking Approaches for Knowledge-Driven Counterspeech Generation Against Hate COLING 2025

FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models ICCV 2025

DAMMFND: Domain-Aware Multimodal Multi-view Fake News Detection AAAI 2025

IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory ACL 2025

ReFu: Recursive Fusion for Exemplar-Free 3D Class-Incremental Learning WACV 2025

Cross-Aligned Fusion for Multimodal Understanding WACV 2025

Unleashing Potentials of Vision-Language Models for Zero-Shot HOI Detection WACV 2025

UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER EMNLP 2025

Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment ACL 2025

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models EMNLP 2025

OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language COLING 2025

Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation EMNLP 2025

Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data ACL 2025

NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT SEMEVAL 2025

UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation SEMEVAL 2025

FiRC-NLP at SemEval-2025 Task 11: To Prompt or to Fine-Tune? Approaches for Multilingual Emotion Classification SEMEVAL 2025

DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition EMNLP 2025

ScaleMatch: Multi-scale Consistency Enhancement for Semi-supervised Semantic Segmentation AAAI 2025