2025 EMNLP EMNLP 2025

Two Challenges, One Solution: Robust Multimodal Learning through Dynamic Modality Recognition and Enhancement

Abstract

AbstractMultimodal machine learning is often hindered by two critical challenges: modality missingness and modality imbalance. These challenges significantly degrade the performance of multimodal models. The majority of existing methods either require the availability of full-modality data during the training phase or necessitate explicit annotations to detect missing modalities. These dependencies severely limit the models’ applicability in the real world. To tackle these problems, we propose a Dynamic modality Recognition and Enhancement for Adaptive Multimodal fusion framework *DREAM*. Within DREAM, we innovatively employ a sample-level dynamic modality assessment mechanism to direct selective reconstruction of missing or underperforming modalities. Additionally, we introduce a soft masking fusion strategy that adaptively integrates different modalities according to their estimated contributions, enabling more accurate and robust predictions. Experimental results on three benchmark datasets consistently demonstrate that DREAM outperforms several representative baseline and state-of-the-art models, marking its robustness against modality missingness and imbalanced modality.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — modality recognition
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio