Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition

Jack Parry; Dimitri Palaz; Georgia Clarke; Pauline Lecomte; Rebecca Mead; Michael Berger; Gregor Hofer

2019 INTERSPEECH INTERSPEECH 2019

Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition

Abstract

Speech Emotion Recognition (SER) is an important and challenging task for human-computer interaction. In the literature deep learning architectures have been shown to yield state-of-the-art performance on this task when the model is trained and evaluated on the same corpus. However, prior work has indicated that such systems often yield poor performance on unseen data. To improve the generalisation capabilities of emotion recognition systems one possible approach is cross-corpus training, which consists of training the model on an aggregation of different corpora. In this paper we present an analysis of the generalisation capability of deep learning models using cross-corpus training with six different speech emotion corpora. We evaluate the models on an unseen corpus and analyse the learned representations using the t-SNE algorithm, showing that architectures based on recurrent neural networks are prone to overfit the corpora present in the training set, while architectures based on convolutional neural networks (CNNs) show better generalisation capabilities. These findings indicate that (1) cross-corpus training is a promising approach for improving generalisation and (2) CNNs should be the architecture of choice for this approach.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Domain Generalization

🧭 Keyword Pioneer — generalization capability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Jack Parry , Dimitri Palaz , Georgia Clarke , Pauline Lecomte , Rebecca Mead , Michael Berger , Gregor Hofer

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Generalization Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Transfer Learning Speech & Audio > Analysis > Speech Analysis

Keywords

domain adaptation convolutional neural network recurrent neural network generalization capability speech emotion recognition cross-corpus training

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019