2019 INTERSPEECH INTERSPEECH 2019

Active Learning Methods for Low Resource End-to-End Speech Recognition

Abstract

Recently developed end-to-end (E2E) automatic speech recognition (ASR) systems demand abundance of transcribed speech data, there are several scenarios where the labeling of speech data is cumbersome and expensive. For a fixed annotation cost, active learning for speech recognition allows to efficiently train the ASR model. In this work, we advance the most common approach for active learning methods which relies on uncertainty sampling technique. In particular, we explore the use of path probability of the decoded sequence as a confidence measure and select the samples with the least confidence for active learning. In order to reduce the sampling bias in active learning, we propose a regularized uncertainty sampling approach that incorporates an i-vector diversity measure. Thus, the active learning in the proposed framework uses a joint score of uncertainty and i-vector diversity. The benefits of the proposed approach are illustrated for an E2E ASR task performed on CSJ and Librispeech datasets. In these experiments, we show that the proposed approach yields considerable improvements over the baseline model using random sampling.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio