Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party

Yifei Wu; Chenda Li; Song Yang; Zhongqin Wu; Yanmin Qian

2021 INTERSPEECH INTERSPEECH 2021

Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party

Abstract

Speech from microphones is vulnerable in a complex acoustic environment due to noise and reverberation, while the cameras are not. Thus, utilizing the visual modality in the “cocktail party” scenario with multi-talkers has become a promising and popular approach. In this paper, we have explored the incorporating of visual modality into the end-to-end multi-talker speech recognition task. We propose two methods based on the modality fusion position, which are encoder-based fusion and decoder-based fusion. And for each method, advanced audio-visual fusion techniques including attention mechanism and dual decoder have been explored to find the best usage of the visual modality. With the proposed methods, our best audio-visual multi-talker automatic speech recognition (ASR) model gets almost ~50.0% word error rate (WER) reduction compared to the audio-only multi-talker ASR system.

🧭 Keyword Pioneer — encoder-based fusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Yifei Wu , Chenda Li , Song Yang , Zhongqin Wu , Yanmin Qian

Topics

Speech & Audio > Recognition > Speech Recognition

Keywords

cocktail party problem modality fusion audio-visual speech recognition multi-talker speech recognition encoder-based fusion decoder-based fusion

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021