Reusing Neural Speech Representations for Auditory Emotion Recognition

Egor Lakomkin; Cornelius Weber; Sven Magg; Stefan Wermter

2017 IJCNLP IJCNLP 2017

Reusing Neural Speech Representations for Auditory Emotion Recognition

Abstract

AbstractAcoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models. The difficulties come from the scarcity of training data, general subjectivity in emotion perception resulting in low annotator agreement, and the uncertainty about which features are the most relevant and robust ones for classification. In this paper, we will tackle the latter problem. Inspired by the recent success of transfer learning methods we propose a set of architectures which utilize neural representations inferred by training on large speech databases for the acoustic emotion recognition task. Our experiments on the IEMOCAP dataset show ~10% relative improvements in the accuracy and F1-score over the baseline recurrent neural network which is trained end-to-end for emotion recognition.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — speech representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Egor Lakomkin , Cornelius Weber , Sven Magg , Stefan Wermter

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Interdisciplinary > Social > Affective Computing Speech & Audio > Analysis > Speech Analysis Deep Learning > Learning Types > Transfer Learning

Keywords

representation learning transfer learning speech processing recurrent neural network end-to-end training speech representation acoustic emotion recognition

Download PDF

Related papers

Procedural Text Generation from an Execution Video 2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset 2017

Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior 2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts 2017

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task 2017