ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection

Youngwook Kim; Shinwoo Park; Youngsoo Namgoong; Yo-Sub Han

2023 EMNLP EMNLP 2023

ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection

Abstract

AbstractImplicit hate speech detection is a challenging task in text classification since no explicit cues (e.g., swear words) exist in the text. While some pre-trained language models have been developed for hate speech detection, they are not specialized in implicit hate speech. Recently, an implicit hate speech dataset with a massive number of samples has been proposed by controlling machine generation. We propose a pre-training approach, ConPrompt, to fully leverage such machine-generated data. Specifically, given a machine-generated statement, we use example statements of its origin prompt as positive samples for contrastive learning. Through pre-training with ConPrompt, we present ToxiGen-ConPrompt, a pre-trained language model for implicit hate speech detection. We conduct extensive experiments on several implicit hate speech datasets and show the superior generalization ability of ToxiGen-ConPrompt compared to other pre-trained models. Additionally, we empirically show that ConPrompt is effective in mitigating identity term bias, demonstrating that it not only makes a model more generalizable but also reduces unintended bias. We analyze the representation quality of ToxiGen-ConPrompt and show its ability to consider target group and toxicity, which are desirable features in terms of implicit hate speeches.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Youngwook Kim , Shinwoo Park , Youngsoo Namgoong , Yo-Sub Han

Topics

Machine Learning > Learning Types > Contrastive Learning Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Representation Learning Deep Learning > Techniques > Contrastive Learning Deep Learning > Learning Types > Contrastive Learning

Keywords

contrastive learning text classification bias mitigation language model language model pre-training hate speech detection implicit hate speech

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023