Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented English

Adyasha Patra; Dhiraj Kumar Sah; Preethi Jyothi

2026 EACL EACL 2026

Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented English

Abstract

AbstractCode-switching, where speakers alternate between languages within a single utterance, poses unique challenges for language identification (LID). Existing LID models often fail to reliably identify English spoken with the accent of the matrix (dominant) language. We show that finetuning LID models with small amounts of such accented English significantly improves code-switched LID, without degrading performance on standard monolingual speech—a limitation observed with direct finetuning on code-switched utterances. This is achieved via low-rank adaptation (LoRA) on limited accented data, which allows models to adapt efficiently. To better evaluate performance, we introduce LangRank, a metric that captures the relative ranking of identified languages often overlooked by traditional metrics. Our method generalizes across multiple language pairs, including Hindi-English, Bengali-English, Mandarin-English, and Arabic-English, providing robust LID in code-switched multilingual contexts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adyasha Patra , Dhiraj Kumar Sah , Preethi Jyothi

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Domain Adaptation

Keywords

speech processing language identification low-rank adaptation multilingual speech accented speech code-switched speech

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026