ASR for Low Resource and Multilingual Noisy Code-Mixed Speech

Tushar Verma; Atul Shree; Ashutosh Modi

2023 INTERSPEECH INTERSPEECH 2023

ASR for Low Resource and Multilingual Noisy Code-Mixed Speech

Abstract

Developing reliable Automatic Speech Recognition (ASR) system for Indian Languages has been challenging due to the limited availability of large-scale, high-quality speech datasets. This problem is even more pronounced when dealing with noisy code-mixed settings with different grapheme vocabularies. This paper proposes a novel ASR system for low-resource noisy speech code mixed with Indian languages. Our approach involves fine-tuning pre-trained models using text transliterated to Devanagari and mapping similar-sounding characters into one character group. Experiments show the model's effectiveness for low-resource Indian languages, including noisy, code-mixed, and multilingual settings. The approach outperforms several baseline models and demonstrates the potential for adapting state-of-the-art ASR models to new languages with limited resources. The proposed system has been deployed in production, where call centers use it to transcribe customer calls.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — pre-trained model fine-tuning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🐣 Hot Topic Early Bird — multilingual speech