From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Aishik Rakshit; Smriti Singh; Shuvam Keshari; Arijit Ghosh Chowdhury; Vinija Jain; Aman Chadha

2025 COLING COLING 2025

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Abstract

AbstractEmbeddings play a pivotal role in the efficacy of large language models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the aforementioned seminal work of (CITATION) and (CITATION) and propose DeepSoftDebias, an algorithm that uses a neural network to perform ‘soft debiasing’. We exhaustively evaluate this algorithm across a variety of state-of-the-art datasets, accuracy metrics, and challenging NLP tasks. On a wide range of metrics, we find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aishik Rakshit , Smriti Singh , Shuvam Keshari , Arijit Ghosh Chowdhury , Vinija Jain , Aman Chadha

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Embedding Learning Machine Learning > Application Areas > Fairness

Keywords

bias mitigation word embedding model debiasing large language model neural network

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025