Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology

Ran Zmigrod; Sabrina J. Mielke; Hanna Wallach; Ryan Cotterell

2019 ACL ACL 2019

Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology

Abstract

AbstractGender stereotypes are manifest in most of the world’s languages and are consequently propagated or amplified by NLP systems. Although research has focused on mitigating gender stereotypes in English, the approaches that are commonly employed produce ungrammatical sentences in morphologically rich languages. We present a novel approach for converting between masculine-inflected and feminine-inflected sentences in such languages. For Spanish and Hebrew, our approach achieves F1 scores of 82% and 73% at the level of tags and accuracies of 90% and 87% at the level of forms. By evaluating our approach using four different languages, we show that, on average, it reduces gender stereotyping by a factor of 2.5 without any sacrifice to grammaticality.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — counterfactual generation

🐣 Hot Topic Early Bird — bias mitigation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ran Zmigrod , Sabrina J. Mielke , Hanna Wallach , Ryan Cotterell

Topics

Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Data Augmentation

Keywords

natural language processing data augmentation bias mitigation counterfactual generation morphological inflection counterfactual data augmentation gender stereotype

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019