2022 INTERSPEECH INTERSPEECH 2022

Normalization of code-switched text for speech synthesis

Abstract

In multilingual communities, code-switching is a common phenomenon. Due to the increase in usage of social media, high level of code-switching is present in social media text as well. These code-switched social media texts are often seen written in monolingual script. Text normalization techniques of the conventional Text-to-Speech (TTS) and machine translation systems may not be able to handle such code-switched texts. Malayalam is a low resource Indic language. Conversational Malayalam contains high level of inter-sentential, intra-sentential as well as intra-word code-switching with English. This paper specifies the techniques for handling Malayalam-English code-switched text data. Evaluation results of experiments conducted on Malayalam-English code-switched data is also presented.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing and Speech & Audio
🧭 Keyword Pioneer — malayalam language
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Natural Language Processing, Speech & Audio
🐣 Hot Topic Early Bird — multilingual processing