2025 ACL ACL 2025

Gender Swapping as a Data Augmentation Technique: Developing Gender-Balanced Datasets for Ukrainian Language Processing

Abstract

AbstractThis paper presents a pipeline for generating gender-balanced datasets through sentence-level gender swapping, addressing the gender-imbalance issue in Ukrainian texts. We select sentences with gender-marked entities, focusing on job titles, generate their inverted alternatives using LLMs and human-in-the-loop, and fine-tune Aya-101 on the resulting dataset for the task of gender swapping. Additionally, we train a Named Entity Recognition (NER) model on gender-balanced data, demonstrating its ability to better recognize gendered entities. The findings unveil the potential of gender-balanced datasets to enhance model robustness and support more fair language processing. Finally, we make a gender-swapped version of NER-UK~2.0 and the fine-tuned Aya-101 model available for download and further research.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — gender swapping
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio