2022 INTERSPEECH INTERSPEECH 2022

Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios

Abstract

The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of interfering speakers overlaps. To achieve acceptable word error rates (WER) by distant ASR in multispeaker reverberant scenarios, source separation and dereverberation can be performed as front-end processing. An existing optimum filter suitable for this task is the recently proposed weighted power minimization distortionless response convolutional beamformer (WPD). In this paper, we introduce a novel speech enhancement front-end for improving the accuracy of back-end ASR in scenarios with multiple reverberant overlapping speakers. The convolutional weighted multichannel Wiener filter (CW-MWF) is optimum for the joint separation and dereverberation task, and it is derived from the convolutional weighted minimum mean square error (CW-MMSE) optimization criterion, presented recently by the current authors. The WER results of performed experiments indicate superior performance of the CW-MWF in real and simulated rooms, irrespective of the method used for filter parameter estimation and the DNN model used for back-end ASR.

🧭 Keyword Pioneer — convolutional beamformer
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio