Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment

Danwei Cai; Xiaoyi Qin; Ming Li

2019 INTERSPEECH INTERSPEECH 2019

Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment

Abstract

Despite the significant improvements in speaker recognition enabled by deep neural networks, unsatisfactory performance persists under far-field scenarios due to the effects of the long range fading, room reverberation, and environmental noises. In this study, we focus on far-field speaker recognition with a microphone array. We propose a multi-channel training framework for the deep speaker embedding neural network on noisy and reverberant data. The proposed multi-channel training framework simultaneously processes the time-, frequency- and channel-information to learn a robust deep speaker embedding. Based on the 2-dimensional or 3-dimensional convolution layer, we investigate different multi-channel training schemes. Experiments on the simulated multi-channel reverberant and noisy data show that the proposed method obtains significant improvements over the single-channel trained deep speaker embedding system with front end speech enhancement or multi-channel embedding fusion.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — multi-channel training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Danwei Cai , Xiaoyi Qin , Ming Li

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Speech & Audio > Recognition > Speaker Recognition Machine Learning > Learning Types > Supervised Learning

Keywords

speech enhancement speaker recognition noise robustness microphone array deep speaker embedding multi-channel training

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019