Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
EMNLP 2025
Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph
EMNLP 2025
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
CVPR 2025
Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs
EMNLP 2025
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
EMNLP 2025
DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms
EMNLP 2025
Beyond Coarse Labels: Fine-Grained Problem Augmentation and Multi-Dimensional Feedback for Emotional Support Conversation
EMNLP 2025
Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip?
EMNLP 2025
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
EMNLP 2025
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
EMNLP 2025
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
EMNLP 2025
Distilling Cross-Modal Knowledge into Domain-Specific Retrievers for Enhanced Industrial Document Understanding
EMNLP 2025
End-to-End Optimization for Multimodal Retrieval-Augmented Generation via Reward Backpropagation
EMNLP 2025
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
EMNLP 2025
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
EMNLP 2025
ComicScene154: A Scene Dataset for Comic Analysis
EMNLP 2025
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
EMNLP 2025
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
EMNLP 2025
LGA: LLM-GNN Aggregation for Temporal Evolution Attribute Graph Prediction
EMNLP 2025
Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
EMNLP 2025
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
EMNLP 2025
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
EMNLP 2025
MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition
EMNLP 2025
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
CVPR 2025
MIO: A Foundation Model on Multimodal Tokens
EMNLP 2025
<
1
…
6
7
8
…
128
>