Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents

Martha Schubert; Daniel Duran; Ingo Siegert

2024 INTERSPEECH INTERSPEECH 2024

Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents

Abstract

Despite significant advancements in speech recognition systems, challenges persist in accurately interpreting spontaneous speech from underrepresented groups like non-standard speakers or younger individuals. The difficulty increases when these conditions overlap. To further explore this topic, we employ a dataset featuring spontaneous as well as read speech from young speakers in Germany, including both, speakers from mono-ethnic and multi-ethnic backgrounds. Our study involves a comparative analysis of speech recognition performance, incorporating gender considerations, using three distinct Automatic Speech Recognition (ASR) engines: Whisper (OpenAI), NeMo (NVIDIA), and Wav2Vec2.0 (Meta AI). Furthermore, we conduct a comprehensive error analysis on the automatically generated transcripts, employing part-of-speech (POS) tagging. This allows us to discern the word types that pose the greatest challenge for comprehension by the ASR engines.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — multi-ethnolectal speech

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Martha Schubert , Daniel Duran , Ingo Siegert

Topics

Machine Learning > Optimization & Theory > Theory Speech & Audio > Recognition > Speech Recognition

Keywords

automatic speech recognition part-of-speech tagging error analysis multi-ethnolectal speech german speech

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024