2023 INTERSPEECH INTERSPEECH 2023

SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition

Abstract

To achieve efficient feature fusion, existing research tends to employ cross-attention to control the contributions of different modalities in fusion. However, this inevitably causes high computational effort and introduces noise weights due to redundant computations. Therefore, this paper proposes sliding window attention (SliWa) to control the feature perception range and dynamically model the modality fusion at different granularities. In addition, we present a novel feature map classifier (FMC) based on high-response feature reuse (HRFR), which explicitly preserves the deep emotional feature structure, thus preventing the submersion of the crucial classification information after average flattening and the negative impacts of parameter flooding. We unify the mentioned modules in the SWRR framework, and the experimental results on the commonly used datasets IEMOCAP and CMU-MOSEI reveal the effectiveness of SWRR in improving the performance of emotion recognition.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio