Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Michael Mendelson; Yonatan Belinkov

2021 EMNLP EMNLP 2021

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Abstract

AbstractModel robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in natural language understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model’s inner representations. We propose a general probing-based framework that allows for post-hoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model’s representations. We experiment with several NLU datasets and known biases, and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — probing framework

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Michael Mendelson , Yonatan Belinkov

Topics

Machine Learning > Optimization & Theory > Theory Machine Learning > Application Areas > Fairness Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Understanding > Sentiment Analysis Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > Interpretability Deep Learning > Optimization & Theory > Evaluation

Keywords

information theory representation learning bias detection out-of-distribution generalization language model debiasing method probing framework probing test model representation bias extraction

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021