Iterative Pseudo-Labeling for Speech Recognition

Qiantong Xu; Tatiana Likhomanenko; Jacob Kahn; Awni Hannun; Gabriel Synnaeve; Ronan Collobert

2020 INTERSPEECH INTERSPEECH 2020

Iterative Pseudo-Labeling for Speech Recognition

Abstract

Pseudo-labeling has recently shown promise in end-to-end automatic speech recognition (ASR). We study Iterative Pseudo-Labeling (IPL), a semi-supervised algorithm which efficiently performs multiple iterations of pseudo-labeling on unlabeled data as the acoustic model evolves. In particular, IPL fine tunes an existing model at each iteration using both labeled data and a subset of unlabeled data. We study the main components of IPL: decoding with a language model and data augmentation. We then demonstrate the effectiveness of IPL by achieving state-of-the-art word-error rate on the LibriSpeech test sets in both standard and low-resource setting. We also study the effect of language models trained on different corpora to show IPL can effectively utilize additional text. Finally, we release a new large in-domain text corpus which does not overlap with the LibriSpeech training transcriptions to foster research in low-resource, semi-supervised ASR.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Qiantong Xu , Tatiana Likhomanenko , Jacob Kahn , Awni Hannun , Gabriel Synnaeve , Ronan Collobert

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

semi-supervised learning data augmentation automatic speech recognition language model iterative training end-to-end asr

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020