2023 INTERSPEECH INTERSPEECH 2023

Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features

Abstract

Spoken language identification (SLI) is a key component in speech-processing tools such as spoken language understanding. In code-switching conversational speech, speakers change languages for short durations posing an additional challenge to language identification techniques. In this work, we investigate the ability of a wav2vec2-based SLI method in identifying the spoken language of English/Mandarin code-switching child-directed conversational speech recorded via Zoom. The proposed system allows the pre-trained wav2vec2-based model to learn language-dependent phonological features by fine-tuning first on detecting manners and places of articulation, then on classifying between English and Mandarin speech segments. The proposed system was tested against parent-child Zoom recordings provided as a part of the MERLIon CCS challenge of language identification. The system achieved the best balanced accuracy of 81.3% and the second-lowest equal error rate of 10.6%.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio