2020
EMNLP
EMNLP 2020
Detecting Independent Pronoun Bias with Partially-Synthetic Data Generation
Abstract
AbstractWe report that state-of-the-art parsers consistently failed to identify “hers” and “theirs” as pronouns but identified the masculine equivalent “his”. We find that the same biases exist in recent language models like BERT. While some of the bias comes from known sources, like training data with gender imbalances, we find that the bias is _amplified_ in the language models and that linguistic differences between English pronouns that are not inherently biased can become biases in some machine learning models. We introduce a new technique for measuring bias in models, using Bayesian approximations to generate partially-synthetic data from the model itself.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— pronoun bia
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Optimization & Theory > Bayesian Inference
Machine Learning > Application Areas > Fairness
Natural Language Processing > Understanding > Syntax
Artificial Intelligence > Core AI > Fairness
Machine Learning > Bayesian & Probabilistic > Bayesian Inference
Machine Learning > Learning Types > Fairness
Deep Learning > Learning Types > Representation Learning