SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes

Mukul Bhutani; Kevin Robinson; Vinodkumar Prabhakaran; Shachi Dave; Sunipa Dev

2024 ACL ACL 2024

SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes

Abstract

AbstractWhile generative multilingual models are rapidly being deployed, their safety and fairness evaluations are largely limited to resources collected in English. This is especially problematic for evaluations targeting inherently socio-cultural phenomena such as stereotyping, where it is important to build multilingual resources that reflect the stereotypes prevalent in respective language communities. However, gathering these resources, at scale, in varied languages and regions pose a significant challenge as it requires broad socio-cultural knowledge and can also be prohibitively expensive. To overcome this critical gap, we employ a recently introduced approach that couples LLM generations for scale with culturally situated validations for reliability, and build SeeGULL Multilingual, a global-scale multilingual dataset of social stereotypes, containing over 25K stereotypes, spanning 23 pairs of languages and regions they are common in, with human annotations, and demonstrate its utility in identifying gaps in model evaluations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — stereotype dataset

🐣 Hot Topic Early Bird — cultural knowledge

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mukul Bhutani , Kevin Robinson , Vinodkumar Prabhakaran , Shachi Dave , Sunipa Dev

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Responsible AI Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Multilingual NLP Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > Evaluation

Keywords

cultural knowledge human annotation multilingual evaluation fairness evaluation multilingual model multilingual dataset social stereotype stereotype detection large language model stereotype dataset

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024