Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation

Brooklyn Sheppard; Anna Richter; Allison Cohen; Elizabeth Smith; Tamara Kneese; Carolyne Pelletier; Ioana Baldini; Yue Dong

2024 ACL ACL 2024

Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation

Abstract

AbstractUsing novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The open-source dataset can be used for a range of NLP tasks, including binary and multi-label classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, provide baselines for each task using common NLP algorithms, and furnish error analyses to give insight into model behaviour when fine-tuned on the Biasly dataset.

🧭 Keyword Pioneer — severity regression

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Brooklyn Sheppard , Anna Richter , Allison Cohen , Elizabeth Smith , Tamara Kneese , Carolyne Pelletier , Ioana Baldini , Yue Dong

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Weakly Supervised Learning

Keywords

text classification text generation multi-label classification misogyny detection expert annotation severity regression

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024