Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding
IJCAI 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
EMNLP 2025
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?
ACL 2025
Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation
ICCV 2025
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
ICCV 2025
Unified Multimodal Understanding via Byte-Pair Visual Encoding
ICCV 2025
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
NAACL 2025
Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion
NAACL 2025
UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks
EMNLP 2025
TEAM_STRIKERS@DravidianLangTech2025: Misogyny Meme Detection in Tamil Using Multimodal Deep Learning
NAACL 2025
LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors
EMNLP 2025
The_Deathly_Hallows@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
NAACL 2025
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
CVPR 2025
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
CVPR 2025
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
CVPR 2025
YNU-HPCC at SemEval-2025 Task 1: Enhancing Multimodal Idiomaticity Representation via LoRA and Hybrid Loss Optimization
SEMEVAL 2025
Zero-Shot Image Captioning with Multi-type Entity Representations
AAAI 2025
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
AAAI 2025
Cross-Aligned Fusion for Multimodal Understanding
WACV 2025
BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations
AAAI 2025
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model
CVPR 2025
Explanation Bottleneck Models
AAAI 2025
Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models
ACL 2025
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
CVPR 2025
AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
ACL 2025
<
1
2
3
4
5
…
13
>