Speech Emotion Recognition with Discriminative Feature Learning

Huan Zhou; Kai Liu

2020 INTERSPEECH INTERSPEECH 2020

Speech Emotion Recognition with Discriminative Feature Learning

Abstract

The performance of a speech emotion recognition (SER) system heavily relies on the deep feature learned from the speeches. Most state of the art has focused on developing various deep architectures for effective feature learning. In this study, we make the first attempt to explore feature discriminability instead. Based on our SER baseline system, we propose three approaches, two on loss functions and one on combined attentive pooling, to enhance feature discriminability. Evaluations on IEMOCAP database consistently validate the effectiveness of all our proposals. Compared to the baseline system, the proposed three systems demonstrated at least +4.0% absolute improvements in accuracy, with no increment in the total number of parameters.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

Authors

Huan Zhou , Kai Liu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Learning Types > Representation Learning Machine Learning > Core Methods > Feature Learning Speech & Audio > Analysis > Speech Analysis

Keywords

feature learning discriminative learning loss function attentive pooling speech emotion recognition discriminative feature learning deep feature

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020