Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

Junho Kim; Byung-Kwan Lee; Yong Man Ro

2021 NIPS NeurIPS 2021

Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

Abstract

Adversarial examples, generated by carefully crafted perturbation, have attracted considerable attention in research fields. Recent works have argued that the existence of the robust and non-robust features is a primary cause of the adversarial examples, and investigated their internal interactions in the feature space. In this paper, we propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck. Specifically, we inject noise variation to each feature unit and evaluate the information flow in the feature representation to dichotomize feature units either robust or non-robust, based on the noise variation magnitude. Through comprehensive experiments, we demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves. Furthermore, we present an attack mechanism intensifying the gradient of non-robust features that is directly related to the model prediction, and validate its effectiveness of breaking model robustness.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — feature distillation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Junho Kim , Byung-Kwan Lee , Yong Man Ro

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning Deep Learning > Techniques > Representation Learning

Keywords

representation learning information bottleneck model robustness adversarial example feature distillation robust feature non-robust feature

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021