2021 INTERSPEECH INTERSPEECH 2021

Affect Recognition Through Scalogram and Multi-Resolution Cochleagram Features

Abstract

An approach to the categorization of voice samples according to emotions expressed by the speaker is proposed which uses Multi-Resolution Cochleagram (MRCG) and scalogram features in a novel way. Audio recordings from the EmoDB, EMOVO and Savee Data-sets are employed in training and testing of predictive models consisting of different sets of speech features. This study systematically evaluates the performance of the feature sets most commonly used in computational paralinguistic tasks (i.e. emobase, eGeMAPS and ComParE) in addition to MRCG- and scalogram-derived features and their fusion, across five different classifiers. The datasets used in this evaluation include speech in three different languages (German, Italian and English). MRCG features outperform the feature sets most commonly used in computational paralinguistic tasks, including emobase, eGeMAPS and ComParE, for the EmoDB (unweighted average recall, UAR = 59.15%) and SAVEE (UAR = 36.12%) datasets, while eGeMAPS provides the best overall UAR (33.84%) for the EMOVO dataset. A support vector machine (SVM) classifier yields the best UAR for EmoDB (80.05%) through fusion of emobase, eGeMAPS, ComParE and MRCG, and for EMOVO (40.31%), through fusion of emobase, eGeMAPS and ComParE. For SAVEE, random forests provide the best result (46.55%) using the ComParE feature set.

🧭 Keyword Pioneer — cochleagram feature
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio
🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio