Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

Wei Ai; Fuchen Zhang; Yuntao Shou; Tao Meng; Haowen Chen; Keqin Li

2025 AAAI AAAI 2025

Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

Abstract

Abstract Efficiently capturing consistent and complementary semantic features in context is crucial for Multimodal Emotion Recognition in Conversations (MERC). However, limited by the over-smoothing or low-pass filtering characteristics of spatial graph neural networks, are insufficient to accurately capture the long-distance consistency low-frequency information and complementarity high-frequency information of the utterances. To this end, this paper revisits the task of MERC from the perspective of the graph spectrum and proposes a Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework GS-MCC. First, GS-MCC uses a sliding window to construct a multimodal interaction graph to model conversational relationships and designs efficient Fourier graph operators (FGO) to extract long-distance high-frequency and low-frequency information, respectively. FGO can be stacked in multiple layers, which can effectively alleviate the over-smoothing problem. Then, GS-MCC uses contrastive learning to construct self-supervised signals that reflect complementarity and consistent semantic collaboration with high and low-frequency signals, thereby improving the ability of high and low-frequency information to reflect genuine emotions. Finally, GS-MCC inputs the coordinated high and low-frequency information into the MLP network and softmax function for emotion prediction. Extensive experiments have proven the superiority of the GS-MCC architecture proposed in this paper on two benchmark data sets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — conversational relationship

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Ai , Fuchen Zhang , Yuntao Shou , Tao Meng , Haowen Chen , Keqin Li

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Contrastive Learning Deep Learning > Architectures > Graph Neural Networks Deep Learning > Techniques > Contrastive Learning Deep Learning > Learning Types > Contrastive Learning

Keywords

contrastive learning frequency analysis speech emotion recognition multimodal emotion recognition graph spectrum graph neural network conversational relationship fourier graph operator

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025