Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios

Mieszko Fraś; Marcin Witkowski; Konrad Kowalczyk

2022 INTERSPEECH INTERSPEECH 2022

Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios

Abstract

The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of interfering speakers overlaps. To achieve acceptable word error rates (WER) by distant ASR in multispeaker reverberant scenarios, source separation and dereverberation can be performed as front-end processing. An existing optimum filter suitable for this task is the recently proposed weighted power minimization distortionless response convolutional beamformer (WPD). In this paper, we introduce a novel speech enhancement front-end for improving the accuracy of back-end ASR in scenarios with multiple reverberant overlapping speakers. The convolutional weighted multichannel Wiener filter (CW-MWF) is optimum for the joint separation and dereverberation task, and it is derived from the convolutional weighted minimum mean square error (CW-MMSE) optimization criterion, presented recently by the current authors. The WER results of performed experiments indicate superior performance of the CW-MWF in real and simulated rooms, irrespective of the method used for filter parameter estimation and the DNN model used for back-end ASR.

🧭 Keyword Pioneer — convolutional beamformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mieszko Fraś , Marcin Witkowski , Konrad Kowalczyk

Topics

Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Synthesis > Speech Enhancement Speech & Audio > Analysis > Speech Enhancement

Keywords

source separation automatic speech recognition speech enhancement distant speech recognition multichannel wiener filter multi-channel wiener filter convolutional beamformer

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022