2024 INTERSPEECH INTERSPEECH 2024

Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics

Abstract

Language Diarization (LD) can be viewed as an expansion of Language Identification (LID) that removes the monolingual input assumption. Taking inspiration from this connection and the challenges inherent in Code-Switching (CS) child-centered speech, we extended PHO-LID, an LID model that incorporates acoustic and phonotactic information without needing phoneme annotation, to LD. Our method explores three avenues to adapt PHO-LID into LD: a temporal slicing scheme bridging LID and LD, an embedding modification enriching LD message, and a back-end scoring facilitating fine-tuning. Compared to the baseline, trained on a simulated out-of-domain dataset, SEAME_sim, our method shows a 15.82% relative accuracy improvement on MERLIon, a child-centered CS speech corpus. The back-end scoring preserves pre-trained knowledge in fine-tuning, with a 16.93% relative accuracy improvement on pre-trained SEAME_sim test set without compromising the fine-tuning test set performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — child-centered speech
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio