2017 INTERSPEECH INTERSPEECH 2017

Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition

Abstract

We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based approaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matrices calculated from masked noisy observations, but the phase information of the target speech is not adequately preserved. In the proposed method, we perform complex-domain source separation based on multi-channel NMF with rank-1 spatial model (rank-1 MNMF) to obtain a speech spatial covariance matrix for estimating a steering vector for the target speech utilizing the separated speech observation in each time-frequency bin. This accurate steering vector estimation is effectively combined with our novel noise mask prediction method using multi-channel robust NMF (MRNMF) to construct a Maximum Likelihood (ML) beamformer that achieved a better speech recognition performance than a state-of-the-art DNN-based beamformer with no environment-specific training. Superiority of the phase preserving source separation to real-valued masks in beamforming is also confirmed through ASR experiments.

πŸŒ‰ Interdisciplinary Bridge β€” Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer β€” phase preserving
🐣 Hot Topic Early Bird β€” maximum likelihood
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio