FairFlow: Mitigating Dataset Biases through Undecided Learning for Natural Language Understanding

Jiali Cheng; Hadi Amiri

2024 EMNLP EMNLP 2024

FairFlow: Mitigating Dataset Biases through Undecided Learning for Natural Language Understanding

Abstract

AbstractLanguage models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called FairFlow that mitigates dataset biases by learning to be undecided in its predictions for data samples or representations associated with known or unknown biases. The framework introduces two key components: a suite of data and model perturbation operations that generate different biased views of input samples, and a contrastive objective that learns debiased and robust representations from the resulting biased views of samples. Experiments show that FairFlow outperforms existing debiasing methods, particularly against out-of-domain and hard test samples without compromising the in-domain performance.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — undecided learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiali Cheng , Hadi Amiri

Topics

Machine Learning > Learning Types > Contrastive Learning Machine Learning > Application Areas > Domain Generalization Machine Learning > Application Areas > Fairness Deep Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Fairness Natural Language Processing > Applications > Natural Language Understanding

Keywords

contrastive learning shortcut learning domain generalization natural language understanding debiasing method dataset bia undecided learning

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024