Pushing the Limits of Non-Autoregressive Speech Recognition

Edwin G. Ng; Chung-Cheng Chiu; Yu Zhang; William Chan

2021 INTERSPEECH INTERSPEECH 2021

Pushing the Limits of Non-Autoregressive Speech Recognition

Abstract

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.

🧭 Keyword Pioneer — wav2vec2 pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Edwin G. Ng , Chung-Cheng Chiu , Yu Zhang , William Chan

Topics

Speech & Audio > Recognition > Speech Recognition

Keywords

automatic speech recognition end-to-end speech recognition non-autoregressive model conformer architecture wav2vec2 pretraining

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021