Recovering Coherent Affective Patterns: Addressing Modality Missing in Multimodal Sentiment Analysis
Abstract
Abstract Multimodal sentiment analysis (MSA) seeks to decode human emotions by integrating heterogeneous modalities. However, real-world scenarios often involve missing or misaligned data due to sensor failures or transmission errors, leading to disrupted temporal dynamics and degraded cross-modal correlations. To address these challenges, we propose RECAP (REcovery of Coherent Affective Patterns), a robust two-stage framework to restore temporal and structural emotional integrity under modality incompleteness. The first stage employs a causality-aware adversarial generator for multi-granularity temporal reconstruction, complemented by a contrastive mutual information factorization module that disentangles shared and modality-specific semantics. The second stage introduces a mutual information-guided attention fusion mechanism with a ranking-based objective, enabling adaptive integration of complementary signals for refined prediction. Extensive experiments on MOSI, MOSEI, and SIMS under various missing-modality conditions demonstrate that RECAP consistently outperforms state-of-the-art methods. Notably, it improves ACC-7 on MOSI by 2.71 percentage points and F1 on SIMS by 6.38 percentage points. These results verify the performance of RECAP in terms of capturing fine-grained emotional cues and robustness.