2022 ACL ACL 2022

Mind the data gap(s): Investigating power in speech and language datasets

Abstract

AbstractAlgorithmic oppression is an urgent and persistent problem in speech and language technologies. Considering power relations embedded in datasets before compiling or using them to train or test speech and language technologies is essential to designing less harmful, more just technologies. This paper presents a reflective exercise to recognise and challenge gaps and the power relations they reveal in speech and language datasets by applying principles of Data Feminism and Design Justice, and building on work on dataset documentation and sociolinguistics.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning
📈 Trend Setter — Digital Humanities
🧭 Keyword Pioneer — data feminism
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors