Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models

Joy Olusanya

2026 EACL EACL 2026

Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models

Abstract

AbstractThis research investigates the role of tone in Standard Yoruba Automatic Speech Recognition (ASR), focusing on how explicit tone marking (diacritics) influences accuracy and overall system performance. As a low-resource tonal language, Yoruba encodes critical lexical and grammatical contrasts via pitch, making tone handling both essential and challenging for ASR. Three pre-trained models, Meta’s MMS-1B-all, OpenAI’s Whisper-small, and AstralZander/Yoruba_ASR, were trained and evaluated on datasets that vary by tone annotation (fully tone-marked vs. non-tone-marked). Using Word Error Rate (WER) and Tone Error Rate (TER) as primary metrics, results consistently favored non-tone-marked data, yielding substantially lower error rates than their tone-marked counterparts. These outcomes suggest that current architectures encounter difficulties with diacritically marked Yoruba, likely stemming from tokenization behavior, insufficient representation of tonal cues, and limited tone modeling in the underlying pre-training. The study concludes that tone-aware approaches, spanning tokenization, acoustic-text alignment, and model objectives, are necessary to improve recognition for Yoruba and other low-resource tonal languages. The findings clarify the interaction between linguistic tone systems and computational modeling, and offer concrete directions for building more robust, tone-sensitive ASR systems.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Joy Olusanya

Topics

Deep Learning > Architectures > Transformers Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition low-resource language tone recognition tonal language

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026