Privacy Preserving Data Selection for Bias Mitigation in Speech Models

Alkis Koudounas; Eliana Pastor; Vittorio Mazzia; Manuel Giollo; Thomas Gueudre; Elisa Reale; Luca Cagliero; Sandro Cumani; Luca De Alfaro; Elena Baralis; Daniele Amberti

2025 ACL ACL 2025

Privacy Preserving Data Selection for Bias Mitigation in Speech Models

Abstract

AbstractEffectively selecting data from subgroups where a model performs poorly is crucial for improving its performance. Traditional methods for identifying these subgroups often rely on sensitive information, raising privacy issues. Additionally, gathering such information at runtime might be impractical. This paper introduces a cost-effective strategy that addresses these concerns. We identify underperforming subgroups and train a model to predict if an utterance belongs to these subgroups without needing sensitive information. This model helps mitigate bias by selecting and adding new data, which is labeled as challenging, for re-training the speech model. Experimental results on intent classification and automatic speech recognition tasks show the effectiveness of our approach in reducing biases and enhancing performance, with improvements in reducing error rates of up to 39% for FSC, 16% for ITALIC, and 22% for LibriSpeech.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Alkis Koudounas , Eliana Pastor , Vittorio Mazzia , Manuel Giollo , Thomas Gueudre , Elisa Reale , Luca Cagliero , Sandro Cumani , Luca De Alfaro , Elena Baralis , Daniele Amberti

Topics

Machine Learning > Application Areas > Fairness Machine Learning > Application Areas > Privacy Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition Security & Privacy > Privacy Machine Learning > Learning Types > Fairness

Keywords

speech recognition automatic speech recognition intent classification privacy preserving bias mitigation data selection speech model subgroup fairness

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

Privacy Preserving Data Selection for Bias Mitigation in Speech Models

Abstract

Authors

Topics

Keywords

Related papers