Binaural Reverberant Speech Separation Based on Deep Neural Networks

Xueliang Zhang; Deliang Wang

2017 INTERSPEECH INTERSPEECH 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks

Abstract

Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaural inputs into a single signal by applying a fixed beamformer. A new spatial feature is proposed and extracted to complement spectral features. The training target is the recently suggested ideal ratio mask (IRM). Systematic evaluations and comparisons show that the proposed system achieves good separation performance and substantially outperforms existing algorithms under challenging multi-source and reverberant environments.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — spatial feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — speech separation

Authors

Xueliang Zhang , Deliang Wang

Topics

Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Supervised Learning Speech & Audio > Analysis > Speech Enhancement Deep Learning > Learning Types > Deep Learning

Keywords

speech separation supervised learning deep neural network spectral feature spatial feature binaural input ideal ratio mask binaural audio

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017

The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection 2017