Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese; Leonardo Neves; Neil Shah; Maarten Bos; Bjorn Ross; Mirella Lapata; Francesco Barbieri

2024 ACL ACL 2024

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Abstract

AbstractContent moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators’ decision making time by 7.4%.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — content moderation pipeline

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Agostina Calabrese , Leonardo Neves , Neil Shah , Maarten Bos , Bjorn Ross , Mirella Lapata , Francesco Barbieri

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Artificial Intelligence > Core AI > Interpretability Interdisciplinary > Social > Social Media Analysis Natural Language Processing > Applications > Sentiment Analysis Machine Learning > Learning Types > Fairness

Keywords

decision making content moderation explainable ai human-ai interaction hate speech detection structured explanation content moderation pipeline human ai interaction speed

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024