Improved Speaker-Dependent Separation for CHiME-5 Challenge

Jian Wu; Yong Xu; Shi-Xiong Zhang; Lianwu Chen; Meng Yu; Lei Xie; Dong Yu

2019 INTERSPEECH INTERSPEECH 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge

Abstract

This paper summarizes several contributions for improving the speaker-dependent separation system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. Specifically, we adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

🧭 Keyword Pioneer — speaker-dependent separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jian Wu , Yong Xu , Shi-Xiong Zhang , Lianwu Chen , Meng Yu , Lei Xie , Dong Yu

Topics

Speech & Audio > Recognition > Speech Recognition Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech recognition speaker-dependent separation multi-talker speech separation

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019