Gender Bias in Nepali-English Machine Translation: A Comparison of LLMs and Existing MT Systems

Supriya Khadka; Bijayan Bhattarai

2025 ACL ACL 2025

Gender Bias in Nepali-English Machine Translation: A Comparison of LLMs and Existing MT Systems

Abstract

AbstractBias in Nepali NLP is rarely addressed, as the language is classified as low-resource, which leads to the perpetuation of biases in downstream systems. Our research focuses on gender bias in Nepali-English machine translation, an area that has seen little exploration. With the emergence of Large Language Models(LLM), there is a unique opportunity to mitigate these biases. In this study, we quantify and evaluate gender bias by constructing an occupation corpus and adapting three gender-bias challenge sets for Nepali. Our findings reveal that gender bias is prevalent in existing translation systems, with translations often reinforcing stereotypes and misrepresenting gender-specific roles. However, LLMs perform significantly better in both gender-neutral and gender-specific contexts, demonstrating less bias compared to traditional machine translation systems. Despite some quirks, LLMs offer a promising alternative for culture-rich, low-resource languages like Nepali. We also explore how LLMs can improve gender accuracy and mitigate biases in occupational terms, providing a more equitable translation experience. Our work contributes to the growing effort to reduce biases in machine translation and highlights the potential of LLMs to address bias in low-resource languages, paving the way for more inclusive and accurate translation systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — nepali-english translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio