Two-Stage Data Augmentation for Low-Resourced Speech Recognition

William Hartmann; Tim Ng; Roger Hsiao; Stavros Tsakalidis; Richard Schwartz

2016 INTERSPEECH INTERSPEECH 2016

Two-Stage Data Augmentation for Low-Resourced Speech Recognition

Abstract

Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.

🚀 Conference Pioneer — INTERSPEECH 2016

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio