LLM generated responses to mitigate the impact of hate speech

Jakub Podolak; Szymon Łukasik; Paweł Balawender; Jan Ossowski; Jan Piotrowski; Katarzyna Bakowicz; Piotr Sankowski

2024 EMNLP EMNLP 2024

LLM generated responses to mitigate the impact of hate speech

Abstract

AbstractIn this study, we explore the use of Large Language Models (LLMs) to counteract hate speech. We conducted the first real-life A/B test assessing the effectiveness of LLM-generated counter-speech. During the experiment, we posted 753 automatically generated responses aimed at reducing user engagement under tweets that contained hate speech toward Ukrainian refugees in Poland.Our work shows that interventions with LLM-generated responses significantly decrease user engagement, particularly for original tweets with at least ten views, reducing it by over 20%. This paper outlines the design of our automatic moderation system, proposes a simple metric for measuring user engagement and details the methodology of conducting such an experiment. We discuss the ethical considerations and challenges in deploying generative AI for discourse moderation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — counter-speech generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jakub Podolak , Szymon Łukasik , Paweł Balawender , Jan Ossowski , Jan Piotrowski , Katarzyna Bakowicz , Piotr Sankowski

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Generation > Text Generation Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Natural Language Processing

Keywords

text generation content moderation a/b testing user engagement hate speech detection counter-speech generation hate speech large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024