Robust Hate Speech Detection via Mitigating Spurious Correlations

Kshitiz Tiwari; Shuhan Yuan; Lu Zhang

2022 IJCNLP IJCNLP 2022

Robust Hate Speech Detection via Mitigating Spurious Correlations

Abstract

AbstractWe develop a novel robust hate speech detection model that can defend against both word- and character-level adversarial attacks. We identify the essential factor that vanilla detection models are vulnerable to adversarial attacks is the spurious correlation between certain target words in the text and the prediction label. To mitigate such spurious correlation, we describe the process of hate speech detection by a causal graph. Then, we employ the causal strength to quantify the spurious correlation and formulate a regularized entropy loss function. We show that our method generalizes the backdoor adjustment technique in causal inference. Finally, the empirical evaluation shows the efficacy of our method.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — causal graph

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kshitiz Tiwari , Shuhan Yuan , Lu Zhang

Topics

Artificial Intelligence > Core AI > Causal Inference Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification

Keywords

causal inference adversarial robustness text classification adversarial attack spurious correlation causal graph backdoor adjustment hate speech detection

Download PDF

Related papers

Chasing the Tail with Domain Generalization: A Case Study on Frequency-Enriched Datasets 2022

Double Trouble: How to not Explain a Text Classifier’s Decisions Using Counterfactuals Synthesized by Masked Language Models? 2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning 2022

Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph 2022

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 2022