Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

Nafis Sadeq; Nafis Tahmid Chowdhury; Farhan Tanvir Utshaw; Shafayat Ahmed; Muhammad Abdullah Adnan

2020 EMNLP EMNLP 2020

Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

Abstract

AbstractAutomatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37% to 31.9%.

🌱 Topic Pioneer — Text Processing

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — bangla language

🐣 Hot Topic Early Bird — bangla language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nafis Sadeq , Nafis Tahmid Chowdhury , Farhan Tanvir Utshaw , Shafayat Ahmed , Muhammad Abdullah Adnan

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Natural Language Processing > Applications > Text Processing Machine Learning > Application Areas > Text Processing

Keywords

unsupervised learning semi-supervised learning speech recognition automatic speech recognition semi-supervised training end-to-end model word error rate speech encoding bangla language

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020