Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function

Jian Huang; Ya Li; Jianhua Tao; Zhen Lian

2018 INTERSPEECH INTERSPEECH 2018

Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function

Abstract

Automatic emotion recognition is a crucial element on understanding human behavior and interaction. Prior works on speech emotion recognition focus on exploring various feature sets and models. Compared with these methods, we propose a triplet framework based on Long Short-Term Memory Neural Network (LSTM) for speech emotion recognition. The system learns a mapping from acoustic features to discriminative embedding features, which are regarded as basis of testing with SVM. The proposed model is trained with triplet loss and supervised loss simultaneously. The triplet loss makes intra-class distance shorter and inter-class distance longer and supervised loss incorporates class label information. In view of variable-length inputs, we explore three different strategies to handle this problem and meanwhile make better use of temporal dynamic process information. Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the proposed methods are beneficial to performance improvement. We demonstrate promise of triplet framework for speech emotion recognition and present our analysis.

🐣 Hot Topic Early Bird — triplet loss

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jian Huang , Ya Li , Jianhua Tao , Zhen Lian

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Learning Types > Representation Learning

Keywords

metric learning support vector machine long short-term memory triplet loss speech emotion recognition

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018