Axiomatic Attribution for Deep Networks

Mukund Sundararajan; Ankur Taly; Qiqi Yan

2017 ICML ICML 2017

Axiomatic Attribution for Deep Networks

Abstract

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — integrated gradient

🐣 Hot Topic Early Bird — feature attribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mukund Sundararajan , Ankur Taly , Qiqi Yan

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Fairness Deep Learning > Techniques > Model Architecture Deep Learning > Optimization & Theory > Theory

Keywords

feature attribution deep learning sensitivity analysis integrated gradient attribution method neural network

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017