Active Learning Methods for Low Resource End-to-End Speech Recognition

Karan Malhotra; Shubham Bansal; Sriram Ganapathy

2019 INTERSPEECH INTERSPEECH 2019

Active Learning Methods for Low Resource End-to-End Speech Recognition

Abstract

Recently developed end-to-end (E2E) automatic speech recognition (ASR) systems demand abundance of transcribed speech data, there are several scenarios where the labeling of speech data is cumbersome and expensive. For a fixed annotation cost, active learning for speech recognition allows to efficiently train the ASR model. In this work, we advance the most common approach for active learning methods which relies on uncertainty sampling technique. In particular, we explore the use of path probability of the decoded sequence as a confidence measure and select the samples with the least confidence for active learning. In order to reduce the sampling bias in active learning, we propose a regularized uncertainty sampling approach that incorporates an i-vector diversity measure. Thus, the active learning in the proposed framework uses a joint score of uncertainty and i-vector diversity. The benefits of the proposed approach are illustrated for an E2E ASR task performed on CSJ and Librispeech datasets. In these experiments, we show that the proposed approach yields considerable improvements over the baseline model using random sampling.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Karan Malhotra , Shubham Bansal , Sriram Ganapathy

Topics

Machine Learning > Learning Types > Active Learning Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

active learning uncertainty sampling end-to-end speech recognition low resource

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019