Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Geraldin Nanfack; Alexander Fulleringer; Jonathan Marty; Michael Eickenberg; Eugene Belilovsky

2024 AAAI AAAI 2024

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Abstract

Abstract Feature visualization is one of the most popular techniques used to interpret the internal behavior of individual units of trained deep neural networks. Based on activation maximization, they consist of finding synthetic or natural inputs that maximize neuron activations. This paper introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation. It consists of finetuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization. We provide evidence of the success of this manipulation on several pre-trained models for the classification task with ImageNet.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Geraldin Nanfack , Alexander Fulleringer , Jonathan Marty , Michael Eickenberg , Eugene Belilovsky

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Optimization Deep Learning > Techniques > Adversarial Learning

Keywords

neural network interpretation model fine-tuning feature visualization model interpretation adversarial manipulation activation maximization

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024