A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders

Manuel Pariente; Antoine Deleforge; Emmanuel Vincent

2019 INTERSPEECH INTERSPEECH 2019

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders

Abstract

Recent studies have explored the use of deep generative models of speech spectra based on variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — unsupervised noise modeling

Authors

Manuel Pariente , Antoine Deleforge , Emmanuel Vincent

Topics

Deep Learning > Models > Generative Models Deep Learning > Models > Variational Inference Speech & Audio > Synthesis > Speech Enhancement

Keywords

variational inference speech enhancement gibbs sampling noise modeling generative model variational autoencoder unsupervised noise modeling

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019