Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems

Ajinkya Kulkarni; Atharva Kulkarni; Miguel Couceiro; Isabel Trancoso

2024 INTERSPEECH INTERSPEECH 2024

Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems

Abstract

In this paper, we present a bias and sustainability focused investigation of Automatic Speech Recognition (ASR) systems, namely Whisper and Massively Multilingual Speech (MMS), which have achieved state-of-the-art (SOTA) performances. Despite their improved performance in controlled settings, there remains a critical gap in understanding their efficacy and equity in real-world scenarios. We analyze ASR biases w.r.t. gender, accent, and age group, as well as their effect on downstream tasks. In addition, we examine the environmental impact of ASR systems, scrutinizing the use of large acoustic models on carbon emission and energy consumption. We also provide insights into our empirical analyses, offering a valuable contribution to the claims surrounding bias and sustainability in ASR systems.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — carbon emission

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Ajinkya Kulkarni , Atharva Kulkarni , Miguel Couceiro , Isabel Trancoso

Topics

Machine Learning > Application Areas > Fairness Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition bias analysis gender bia accent bia carbon emission

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024