Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

Hassan Taherian; Zhong-Qiu Wang; Deliang Wang

2019 INTERSPEECH INTERSPEECH 2019

Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

Abstract

Despite successful applications of multi-channel signal processing in robust automatic speech recognition (ASR), relatively little research has been conducted on the effectiveness of such techniques in the robust speaker recognition domain. This paper introduces time-frequency (T-F) masking-based beamforming to address text-independent speaker recognition in conditions where strong diffuse noise and reverberation are both present. We examine various masking-based beamformers, such as parameterized multi-channel Wiener filter, generalized eigenvalue (GEV) beamformer and minimum variance distortion-less response (MVDR) beamformer, and evaluate their performance in terms of speaker recognition accuracy for i-vector and x-vector based systems. In addition, we present a different formulation for estimating steering vectors from speech covariance matrices. We show that rank-1 approximation of a speech covariance matrix based on generalized eigenvalue decomposition leads to the best results for the masking-based MVDR beamformer. Experiments on the recently introduced NIST SRE 2010 retransmitted corpus show that the MVDR beamformer with rank-1 approximation provides an absolute reduction of 5.55% in equal error rate compared to a standard masking-based MVDR beamformer.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — multi-channel beamforming

🐣 Hot Topic Early Bird — deep learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Hassan Taherian , Zhong-Qiu Wang , Deliang Wang

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition Speech & Audio > Analysis > Speech Enhancement

Keywords

deep learning speaker embedding speaker recognition noise robustness reverberant speech multi-channel beamforming multi-channel processing mvdr beamformer gev beamformer

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019