Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car

Jaime Lorenzo-Trueba; Cassia Valentini Botinhao; Gustav Eje Henter; Junichi Yamagishi

2017 INTERSPEECH INTERSPEECH 2017

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car

Abstract

This paper analyzes a) how often listeners interpret the emotional content of an utterance incorrectly when listening to vocoded or natural speech in adverse conditions; b) which noise conditions cause the most misperceptions; and c) which group of listeners misinterpret emotions the most. The long-term goal is to construct new emotional speech synthesizers that adapt to the environment and to the listener. We performed a large-scale listening test where over 400 listeners between the ages of 21 and 72 assessed natural and vocoded acted emotional speech stimuli. The stimuli had been artificially degraded using a room impulse response recorded in a car and various in-car noise types recorded in a real car. Experimental results show that the recognition rates for emotions and perceived emotional strength degrade as signal-to-noise ratio decreases. Interestingly, misperceptions seem to be more pronounced for negative and low-arousal emotions such as calmness or anger, while positive emotions such as happiness appear to be more robust to noise. An ANOVA analysis of listener meta-data further revealed that gender and age also influenced results, with elderly male listeners most likely to incorrectly identify emotions.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning

🧭 Keyword Pioneer — vocoded speech

🐣 Hot Topic Early Bird — emotion recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Jaime Lorenzo-Trueba , Cassia Valentini Botinhao , Gustav Eje Henter , Junichi Yamagishi

Topics

Machine Learning > Application Areas > Privacy Interdisciplinary > Cognitive Science > Perception Interdisciplinary > Social > Affective Computing

Keywords

emotion recognition speech perception emotional speech listening test vocoded speech speech degradation

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017