Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
EMNLP 2025
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?
ACL 2025
External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding
IJCAI 2025
UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks
EMNLP 2025
LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors
EMNLP 2025
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
CVPR 2025
AudioGenX: Explainability on Text-to-Audio Generative Models
AAAI 2025
ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos
ACL 2025
Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation
ICCV 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
ACL 2025
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
CVPR 2025
Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification
CVPR 2025
Big Escape Benchmark: Evaluating Human-Like Reasoning in Language Models via Real-World Escape Room Challenges
ACL 2025
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
CVPR 2025
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model
CVPR 2025
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
CVPR 2025
Let Modalities Teach Each Other: Modal-Collaborative Knowledge Extraction and Fusion for Multimodal Knowledge Graph Completion
NAACL 2025
InstructOCR: Instruction Boosting Scene Text Spotting
AAAI 2025
AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
ACL 2025
Predicting Implicit Arguments in Procedural Video Instructions
ACL 2025
TEAM_STRIKERS@DravidianLangTech2025: Misogyny Meme Detection in Tamil Using Multimodal Deep Learning
NAACL 2025
Zero-Shot Image Captioning with Multi-type Entity Representations
AAAI 2025
Multi-Granular Multimodal Clue Fusion for Meme Understanding
AAAI 2025
BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning
AAAI 2025
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
CVPR 2025
<
1
2
3
4
5
…
13
>