VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Quan Wang; Hannah Muckenhirn; Kevin Wilson; Prashant Sridhar; Zelin Wu; John R. Hershey; Rif A. Saurous; Ron J. Weiss; Ye Jia; Ignacio Lopez Moreno

2019 INTERSPEECH INTERSPEECH 2019

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Abstract

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — voice separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Speech & Audio

Authors

Quan Wang , Hannah Muckenhirn , Kevin Wilson , Prashant Sridhar , Zelin Wu , John R. Hershey , Rif A. Saurous , Ron J. Weiss , Ye Jia , Ignacio Lopez Moreno

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Synthesis > Speech Enhancement Speech & Audio > Analysis > Speaker Verification Speech & Audio > Analysis > Speech Enhancement

Keywords

speaker embedding speaker recognition speaker separation voice separation spectrogram masking target speaker separation

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019