← Learning Types

Deep Learning › Learning Types ›

Multimodal Learning

323 directly classified papers

Papers per year

Papers

Model-free Domain Adaptation for Concealed Multimodal Large-Language Models WACV 2026

LangPose: Language-Aligned Motion for Robust 3D Human Pose Estimation WACV 2026

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts AAAI 2026

Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation CVPR 2025

Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation ICCV 2025

Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration ICCV 2025

LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors EMNLP 2025

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval AAAI 2025

UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks EMNLP 2025

EssayDetect at GenAI Detection Task 2: Guardians of Academic Integrity: Multilingual Detection of AI-Generated Essays COLING 2025

Adversarial Alignment with Anchor Dragging Drift (A3D2): Multimodal Domain Adaptation with Partially Shifted Modalities ACL 2025

Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening EMNLP 2025

PresentAgent: Multimodal Agent for Presentation Video Generation EMNLP 2025

External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding IJCAI 2025

Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion NAACL 2025

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning NAACL 2025

TEAM_STRIKERS@DravidianLangTech2025: Misogyny Meme Detection in Tamil Using Multimodal Deep Learning NAACL 2025

The_Deathly_Hallows@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages NAACL 2025

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval ICCV 2025

YNU-HPCC at SemEval-2025 Task 1: Enhancing Multimodal Idiomaticity Representation via LoRA and Hybrid Loss Optimization SEMEVAL 2025

Cross-Aligned Fusion for Multimodal Understanding WACV 2025

Unified Multimodal Understanding via Byte-Pair Visual Encoding ICCV 2025

Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models ACL 2025

Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation CVPR 2025

ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate CVPR 2025