The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

Heng Yang; Ke Li

2024 EMNLP EMNLP 2024

The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples

Abstract

AbstractRecent studies have revealed the vulnerability of pre-trained language models to adversarial attacks. Adversarial defense techniques have been proposed to reconstruct adversarial examples within feature or text spaces. However, these methods struggle to effectively repair the semantics in adversarial examples, resulting in unsatisfactory defense performance. To repair the semantics in adversarial examples, we introduce a novel approach named Reactive Perturbation Defocusing (Rapid), which employs an adversarial detector to identify the fake labels of adversarial examples and leverages adversarial attackers to repair the semantics in adversarial examples. Our extensive experimental results, conducted on four public datasets, demonstrate the consistent effectiveness of Rapid in various adversarial attack scenarios. For easy evaluation, we provide a click-to-run demo of Rapid at https://tinyurl.com/22ercuf8.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — semantic repair

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Heng Yang , Ke Li

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Adversarial Learning Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Robustness

Keywords

text classification model robustness adversarial attack adversarial defense adversarial example pre-trained language model feature reconstruction semantic repair textual adversarial example textual adversarial

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024