2020
EMNLP
EMNLP 2020
Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
Abstract
AbstractAutomatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.
🌱
Topic Pioneer
— Text Processing
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— bangla language
🐣
Hot Topic Early Bird
— bangla language
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio