2018 INTERSPEECH INTERSPEECH 2018

Active Learning for LF-MMI Trained Neural Networks in ASR

Abstract

This paper investigates how active learning (AL) effects the training of neural network acoustic models based on Lattice-free Maximum Mutual Information (LF-MMI) in automatic speech recognition (ASR). To fully exploit the most informative examples from fresh datasets, different data selection criterions based on the heterogeneous neural networks were studied. In particular, we examined the relationship among the transcription cost of human labeling, example informativeness and data selection criterions for active learning. As a comparison, we tried both semi-supervised training (SST) and active learning to improve the acoustic models. Experiments were performed for both the small-scale and large-scale ASR systems. Experimental results suggested that, our AL scheme can benefit much more from the fresh data than the SST in reducing the word error rate (WER).The AL yields 6~13% relative WER reduction against the baseline trained on a 4000 hours transcribed dataset, by only selecting 1.2K hrs informative utterances for human labeling via active learning.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio