2016 INTERSPEECH INTERSPEECH 2016

Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition

Abstract

The lack of transcription files will lead to a high out-of-vocabulary (OOV) rate and a weak language model in low-resource speech recognition systems. This paper presents a web data selection method to augment these systems. After mapping all the vocabularies or short sentences to vectors in a low-dimensional space through a word embedding technique, the similarities between the web data and the small pool of training transcriptions are calculated. Then, the web data with high similarity are selected to expand the pronunciation lexicon or language model. Experiments are conducted on the NIST Open KWS15 Swahili VLLP recognition task. Compared with the baseline system, our methods can achieve a 5.23% absolute reduction in word error rate (WER) using the expanded pronunciation lexicon and a 9.54% absolute WER reduction using both the expanded lexicon and language model.

πŸš€ Conference Pioneer β€” INTERSPEECH 2016
πŸŒ‰ Interdisciplinary Bridge β€” Machine Learning and Speech & Audio
🧭 Keyword Pioneer β€” low-resource speech recognition
🐣 Hot Topic Early Bird β€” language model
🐝 Cross-Pollinator β€” Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio