Enhancing Chinese Offensive Language Detection with Homophonic Perturbation

Junqi Wu; Shujie Ji; Kang Zhong; Huiling Peng; Zhendongxiao; Xiongding Liu; Wu Wei

2025 EMNLP EMNLP 2025

Enhancing Chinese Offensive Language Detection with Homophonic Perturbation

Abstract

AbstractDetecting offensive language in Chinese is challenging due to homophonic substitutions used to evade detection. We propose a framework to improve large language models’ robustness against such phonetic attacks. First, we construct HED-COLD, the first large-scale and systematic homophonic dataset for Chinese offensive language detection. Additionally, we design a homophone-aware pretraining strategy that learns the mappings among orthography, phonetics, and semantics between original and perturbed text. Experimental results show that our approach achieves state-of-the-art performance on both the COLD test set and the toxicity benchmark ToxiCloakCN. Notably, it achieves greater gains in domains susceptible to homophonic attacks, such as gender and regional content. These results demonstrate improved robustness and generalization against phonetic adversarial attacks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — homophonic perturbation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Junqi Wu , Shujie Ji , Kang Zhong , Huiling Peng , Zhendongxiao , Xiongding Liu , Wu Wei

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Applications > Sentiment Analysis Deep Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Robustness

Keywords

adversarial robustness offensive language detection pretraining strategy adversarial attack chinese language processing chinese language homophonic perturbation phonetic attack

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025