Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Takuma Udagawa; Yang Zhao; Hiroshi Kanayama; Bishwaranjan Bhattacharjee

2025 EMNLP EMNLP 2025

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Abstract

AbstractLarge language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — protected attribute detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Takuma Udagawa , Yang Zhao , Hiroshi Kanayama , Bishwaranjan Bhattacharjee

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > Fairness

Keywords

bias detection bias mitigation pretraining corpus social bia large language model protected attribute pretraining datum pretrained corpus protected attribute detection regard classification

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025