2019 INTERSPEECH INTERSPEECH 2019

Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement

Abstract

Previous studies indicate that noise and speaker variations can degrade the performance of deep-learning-based speech-enhancement systems. To increase the system performance over environmental variations, we propose a novel speaker-aware system that integrates a deep denoising autoencoder (DDAE) with an embedded speaker identity. The overall system first extracts embedded speaker identity features using a neural network model; then the DDAE takes the augmented features as input to generate enhanced spectra. With the additional embedded features, the speech-enhancement system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed speech-enhancement system on the TIMIT dataset. Experimental results showed that the proposed speech-enhancement system could improve the sound quality and intelligibility of speech signals from additive noise-corrupted utterances. In addition, the results suggested system robustness for unseen speakers when combined with speaker features.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — deep denoising autoencoder
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy, Speech & Audio