Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang; Xiang Zhou; Yixin Wan; Xiaoqing Zheng; Kai-Wei Chang; Cho-jui Hsieh

2022 ACL ACL 2022

Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Abstract

AbstractExisting studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio