Adversarial Latent Representation Learning for Speech Enhancement

Yuanhang Qiu; Ruili Wang

2020 INTERSPEECH INTERSPEECH 2020

Adversarial Latent Representation Learning for Speech Enhancement

Abstract

This paper proposes a novel adversarial latent representation learning (ALRL) method for speech enhancement. Based on adversarial feature learning, ALRL employs an extra encoder to learn an inverse mapping from the generated data distribution to the latent space. The encoder builds an inner connection with the generator, and provides relevant latent information for adversarial feature modelling. A new loss function is proposed to implement the encoder mapping simultaneously. In addition, the multi-head self-attention is also applied to the encoder for learning of long-range dependencies and further effective adversarial representations. The experimental results demonstrate that ALRL outperforms current GAN-based speech enhancement methods.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — latent representation learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yuanhang Qiu , Ruili Wang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Adversarial Learning Deep Learning > Models > Generative Models Speech & Audio > Synthesis > Speech Enhancement Deep Learning > Learning Types > Adversarial Learning Deep Learning > Learning Types > Representation Learning

Keywords

adversarial learning speech enhancement latent representation generative adversarial network latent representation learning

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020