Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
StuD: A Multimodal Approach for Stuttering Detection with RAG and Fusion Strategies
IJCNLP 2025
NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT
SEMEVAL 2025
A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation
IJCNLP 2025
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
CVPR 2025
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
ACL 2025
Multispectral Object Detection Enhanced by Cross-Modal Information Complementary and Cosine Similarity Channel Resampling Modules
WACV 2025
UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER
EMNLP 2025
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
EMNLP 2025
Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation
EMNLP 2025
M3Retrieve: Benchmarking Multimodal Retrieval for Medicine
EMNLP 2025
DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
EMNLP 2025
Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction
EMNLP 2025
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
EMNLP 2025
LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
EMNLP 2025
Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection
EMNLP 2025
Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
EMNLP 2025
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
EMNLP 2025
A Survey on Multi-modal Intent Recognition: Recent Advances and New Frontiers
EMNLP 2025
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction
ACL 2025
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
ACL 2025
Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models
COLING 2025
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
ACL 2025
Enhancing Dialectal Arabic Intent Detection through Cross-Dialect Multilingual Input Augmentation
COLING 2025
When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models
NAACL 2025
<
1
…
4
5
6
…
49
>