Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
Abstract
Previous studies indicate that noise and speaker variations can degrade the performance of deep-learning-based speech-enhancement systems. To increase the system performance over environmental variations, we propose a novel speaker-aware system that integrates a deep denoising autoencoder (DDAE) with an embedded speaker identity. The overall system first extracts embedded speaker identity features using a neural network model; then the DDAE takes the augmented features as input to generate enhanced spectra. With the additional embedded features, the speech-enhancement system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed speech-enhancement system on the TIMIT dataset. Experimental results showed that the proposed speech-enhancement system could improve the sound quality and intelligibility of speech signals from additive noise-corrupted utterances. In addition, the results suggested system robustness for unseen speakers when combined with speaker features.