2019 INTERSPEECH INTERSPEECH 2019

Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition

Abstract

Front-end processing is one of the ways to impart noise robustness to speech emotion recognition systems in mismatched scenarios. Here, we implement and compare different frontend robustness techniques for their efficacy in speech emotion recognition. First, we use a feature compensation technique based on the Vector Taylor Series (VTS) expansion of noisy Mel-Frequency Cepstral Coefficients (MFCCs). Next, we improve upon the feature compensation technique by using the VTS expansion with auditory masking formulation. We have also looked into the applicability of 10th-root compression in MFCC computation. Further, a Time Delay Neural Network based Denoising Autoencoder (TDNN-DAE) is implemented to estimate the clean MFCCs from the noisy MFCCs. These techniques have not been investigated yet for their suitability to robust speech emotion recognition task. The performance of these front-end techniques are compared with the Non-Negative Matrix Factorization (NMF) based front-end. Relying on extensive experiments done on two standard databases (EmoDB and IEMOCAP), contaminated with 5 types of noise, we show that these techniques provide significant performance gain in emotion recognition task. We also show that along with front-end compensation, applying feature selection to non-MFCC high-level descriptors results in better performance.

🧭 Keyword Pioneer — feature compensation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio