← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

Towards Reliable Large Audio Language Model ACL 2025

Cross-Aligned Fusion for Multimodal Understanding WACV 2025

Do Mentioned Items Truly Matter? Enhancing Conversational Recommender Systems with Causal Intervention and Large Language Models IJCAI 2025

Going Beyond Consistency: Target-oriented Multi-view Graph Neural Network IJCAI 2025

Findings of the Shared Task on Misogyny Meme Detection: DravidianLangTech@NAACL 2025 NAACL 2025

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages ACL 2025

Dll5143@DravidianLangTech 2025: Majority Voting-Based Framework for Misogyny Meme Detection in Tamil and Malayalam NAACL 2025

From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments NAACL 2025

Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers IJCAI 2025

To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation WACV 2025

DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification WACV 2025

RGB-D Video Mirror Detection WACV 2025

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness ICCV 2025

LONG3R: Long Sequence Streaming 3D Reconstruction ICCV 2025

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization ICCV 2025

Code_Conquerors@DravidianLangTech 2025: Multimodal Misogyny Detection in Dravidian Languages Using Vision Transformer and BERT NAACL 2025

AGRec: Adapting Autoregressive Decoders with Graph Reasoning for LLM-based Sequential Recommendation ACL 2025

Fired_from_NLP@DravidianLangTech 2025: A Multimodal Approach for Detecting Misogynistic Content in Tamil and Malayalam Memes NAACL 2025

CUET_NetworkSociety@DravidianLangTech 2025: A Multimodal Framework to Detect Misogyny Meme in Dravidian Languages NAACL 2025

Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era IJCAI 2025

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation ACL 2025

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis AAAI 2025

Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition ICCV 2025

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis ICCV 2025

Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants ACL 2025