BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan; Ruizhe Chen; Ruiling Xu; Zuozhu Liu

2024 EMNLP EMNLP 2024

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Abstract

AbstractEvaluating the bias of LLMs becomes more crucial with their rapid development. However, existing evaluation approaches rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with its inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT-4-as-Judge in detecting bias. Furthermore, through application studies, we showcase the utility of BiasAlert in reliable LLM fairness evaluation and bias mitigation across various scenarios. Model and code will be publicly released.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — open-text generation

🐣 Hot Topic Early Bird — fairness evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhiting Fan , Ruizhe Chen , Ruiling Xu , Zuozhu Liu

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Fairness Deep Learning > Learning Types > Evaluation

Keywords

bias mitigation fairness evaluation social bias detection large language model open-text generation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024