Understanding Black-box Predictions via Influence Functions

Pang Wei Koh; Percy Liang

2017 ICML ICML 2017

Understanding Black-box Predictions via Influence Functions

Abstract

How can we explain the predictions of a black-box model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — model debugging

🐣 Hot Topic Early Bird — robust statistics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Pang Wei Koh , Percy Liang

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Machine Learning > Application Areas > Fairness Machine Learning > Optimization & Theory > Information Theory Machine Learning > Core Methods > Feature Learning

Keywords

robust statistics model interpretability model debugging training data attribution influence function training datum black-box model model interpretation gradient analysis hessian-vector product

Download PDF

Related papers

Bottleneck Conditional Density Estimation 2017

Constrained Policy Optimization 2017

Near-Optimal Design of Experiments via Regret Minimization 2017

Input Convex Neural Networks 2017

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation 2017