2025 NAACL NAACL 2025

The Impact of Dialect Variation on Robust Automatic Speech Recognition for Catalan

Abstract

AbstractTo accurately transcribe a speech signal, automatic speech recognition (ASR) systems must show robustness to a wide range of task independent variation, such as speaker factors, recording quality, or even ädversarial noisedesigned to disrupt performance.We manipulated the dialect composition of fine-tuning data for ASR to study whether balancing the relative proportion of dialects had an impact on models robustness to two such sources of variation”:" dialect variation and adversarial perturbations. We fine-tuned XLSR-53 for Catalan ASR using four different dialect compositions, each containing the Central Catalan dialect. These were defined as 100%, 80%, 50%, and 20% Central Catalan, with the remaining portions split evenly between four other Catalan dialects. While increasing the relative proportion of dialect variants improved models’ dialect robustness, this did not have a meaningful impact on adversarial robustness. These findings suggest that while improvements to ASR can be made by diversifying the training data, such changes do not sufficiently counteract adversarial attacks, leaving the technology open to security threats.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio