“You are grounded!”: Latent Name Artifacts in Pre-trained Language Models

Vered Shwartz; Rachel Rudinger; Oyvind Tafjord

2020 EMNLP EMNLP 2020

“You are grounded!”: Latent Name Artifacts in Pre-trained Language Models

Abstract

AbstractPre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for ‘Donald is a’ substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Fairness

🧭 Keyword Pioneer — name artifact

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Vered Shwartz , Rachel Rudinger , Oyvind Tafjord

Topics

Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Large Language Models Deep Learning > Learning Types > Fairness

Keywords

bias detection reading comprehension bias mitigation pre-trained language model pretrained language model entity grounding name representation name artifact

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020