“Flex Tape Can’t Fix That”: Bias and Misinformation in Edited Language Models

Karina Halevy; Anna Sotnikova; Badr AlKhamissi; Syrielle Montariol; Antoine Bosselut

2024 EMNLP EMNLP 2024

“Flex Tape Can’t Fix That”: Bias and Misinformation in Edited Language Models

Abstract

AbstractWeight-based model editing methods update the parametric knowledge of language models post-training. However, these methods can unintentionally alter unrelated parametric knowledge representations, potentially increasing the risk of harm. In this work, we investigate how weight editing methods unexpectedly amplify model biases after edits. We introduce a novel benchmark dataset, Seesaw-CF, for measuring bias amplification of model editing methods for demographic traits such as race, geographic origin, and gender. We use Seesaw-CF to examine the impact of model editing on bias in five large language models. Our results demonstrate that edited models exhibit, to various degrees, more biased behavior for certain demographic groups than before they were edited, specifically becoming less confident in properties for Asian and African subjects. Additionally, editing facts about place of birth, country of citizenship, or gender has particularly negative effects on the model’s knowledge about unrelated properties, such as field of work, a pattern observed across multiple models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — weight-based model editing

🐣 Hot Topic Early Bird — demographic bia

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Karina Halevy , Anna Sotnikova , Badr AlKhamissi , Syrielle Montariol , Antoine Bosselut

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Knowledge Editing Artificial Intelligence > Core AI > Fairness Deep Learning > Models > Large Language Models Artificial Intelligence > Core AI > Knowledge Editing Machine Learning > Learning Types > Robustness

Keywords

knowledge editing model editing language model bias amplification demographic bia large language model weight editing weight-based model editing

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024