Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Siddique Latif; Muhammad Asim; Rajib Rana; Sara Khalifa; Raja Jurdak; Bjorn W. Schuller

2020 INTERSPEECH INTERSPEECH 2020

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Abstract

Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the effectiveness of the proposed framework, we present results for SER on (i) synthetic feature vectors, (ii) augmentation of the training data with synthetic features, (iii) encoded features in compressed representation. Our results show that the proposed framework can effectively learn compressed emotional representations as well as it can generate synthetic samples that help improve performance in within-corpus and cross-corpus evaluation.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Siddique Latif , Muhammad Asim , Rajib Rana , Sara Khalifa , Raja Jurdak , Bjorn W. Schuller

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Application Areas > Data Augmentation Deep Learning > Models > Generative Models Speech & Audio > Analysis > Speech Analysis Deep Learning > Learning Types > Adversarial Learning

Keywords

feature learning data augmentation synthetic datum generative adversarial network speech emotion recognition

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020