← Learning Types

Deep Learning › Learning Types ›

Multimodal Learning

323 directly classified papers

Papers per year

Papers

Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening EMNLP 2025

Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? ACL 2025

External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding IJCAI 2025

UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks EMNLP 2025

LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors EMNLP 2025

ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate CVPR 2025

AudioGenX: Explainability on Text-to-Audio Generative Models AAAI 2025

ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos ACL 2025

Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation ICCV 2025

VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search ACL 2025

Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation CVPR 2025

Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification CVPR 2025

Big Escape Benchmark: Evaluating Human-Like Reasoning in Language Models via Real-World Escape Room Challenges ACL 2025

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input CVPR 2025

GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model CVPR 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025

Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion NAACL 2025

InstructOCR: Instruction Boosting Scene Text Spotting AAAI 2025

AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents ACL 2025

Predicting Implicit Arguments in Procedural Video Instructions ACL 2025

TEAM_STRIKERS@DravidianLangTech2025: Misogyny Meme Detection in Tamil Using Multimodal Deep Learning NAACL 2025

Zero-Shot Image Captioning with Multi-type Entity Representations AAAI 2025

Multi-Granular Multimodal Clue Fusion for Meme Understanding AAAI 2025

BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning AAAI 2025

SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection CVPR 2025