Can Adversarial Training Be Manipulated By Non-Robust Features?

Lue Tao; Lei Feng; Hongxin Wei; Jinfeng Yi; Sheng-jun Huang; Songcan Chen

2022 NIPS NeurIPS 2022

Can Adversarial Training Be Manipulated By Non-Robust Features?

Abstract

Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — availability attack

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lue Tao , Lei Feng , Hongxin Wei , Jinfeng Yi , Sheng-jun Huang , Songcan Chen

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Learning Types > Adversarial Learning Artificial Intelligence > Core AI > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Robustness

Keywords

data poisoning adversarial training adversarial example availability attack non-robust feature stability attack test robustness

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022