Affect Recognition Through Scalogram and Multi-Resolution Cochleagram Features

Fasih Haider; Saturnino Luz

2021 INTERSPEECH INTERSPEECH 2021

Affect Recognition Through Scalogram and Multi-Resolution Cochleagram Features

Abstract

An approach to the categorization of voice samples according to emotions expressed by the speaker is proposed which uses Multi-Resolution Cochleagram (MRCG) and scalogram features in a novel way. Audio recordings from the EmoDB, EMOVO and Savee Data-sets are employed in training and testing of predictive models consisting of different sets of speech features. This study systematically evaluates the performance of the feature sets most commonly used in computational paralinguistic tasks (i.e. emobase, eGeMAPS and ComParE) in addition to MRCG- and scalogram-derived features and their fusion, across five different classifiers. The datasets used in this evaluation include speech in three different languages (German, Italian and English). MRCG features outperform the feature sets most commonly used in computational paralinguistic tasks, including emobase, eGeMAPS and ComParE, for the EmoDB (unweighted average recall, UAR = 59.15%) and SAVEE (UAR = 36.12%) datasets, while eGeMAPS provides the best overall UAR (33.84%) for the EMOVO dataset. A support vector machine (SVM) classifier yields the best UAR for EmoDB (80.05%) through fusion of emobase, eGeMAPS, ComParE and MRCG, and for EMOVO (40.31%), through fusion of emobase, eGeMAPS and ComParE. For SAVEE, random forests provide the best result (46.55%) using the ComParE feature set.

🧭 Keyword Pioneer — cochleagram feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

Authors

Fasih Haider , Saturnino Luz

Topics

Machine Learning > Core Methods > Classification Speech & Audio > Analysis > Speech Analysis

Keywords

emotion recognition support vector machine feature fusion random forest affect recognition speech feature cochleagram feature scalogram feature

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021