Take Its Essence, Discard Its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Junyu Lu; Bo Xu; Xiaokun Zhang; Kaiyuan Liu; Dongyu Zhang; Liang Yang; Hongfei Lin

2024 COLING COLING 2024

Take Its Essence, Discard Its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Abstract

AbstractResearchers have attempted to mitigate lexical bias in toxic language detection (TLD). However, existing methods fail to disentangle the “useful” and “misleading” impact of lexical bias on model decisions. Therefore, they do not effectively exploit the positive effects of the bias and lead to a degradation in the detection performance of the debiased model. In this paper, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the “useful impact” of lexical bias and eliminates the “misleading impact”. Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio