← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision EMNLP 2025

Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph EMNLP 2025

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation CVPR 2025

Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs EMNLP 2025

Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization EMNLP 2025

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms EMNLP 2025

Beyond Coarse Labels: Fine-Grained Problem Augmentation and Multi-Dimensional Feedback for Emotional Support Conversation EMNLP 2025

Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip? EMNLP 2025

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks EMNLP 2025

PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications EMNLP 2025

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues? EMNLP 2025

Distilling Cross-Modal Knowledge into Domain-Specific Retrievers for Enhanced Industrial Document Understanding EMNLP 2025

End-to-End Optimization for Multimodal Retrieval-Augmented Generation via Reward Backpropagation EMNLP 2025

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning EMNLP 2025

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study EMNLP 2025

ComicScene154: A Scene Dataset for Comic Analysis EMNLP 2025

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs EMNLP 2025

VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data EMNLP 2025

LGA: LLM-GNN Aggregation for Temporal Evolution Attribute Graph Prediction EMNLP 2025

Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity EMNLP 2025

Grounding Multilingual Multimodal LLMs With Cultural Knowledge EMNLP 2025

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation EMNLP 2025

MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition EMNLP 2025

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement CVPR 2025

MIO: A Foundation Model on Multimodal Tokens EMNLP 2025