Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Xin Zhang; Armando Solar-Lezama; Rishabh Singh

2018 NIPS NeurIPS 2018

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Abstract

We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.

🌱 Topic Pioneer — Interpretability

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

📈 Trend Setter — Interpretability

🧭 Keyword Pioneer — symbolic correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xin Zhang , Armando Solar-Lezama , Rishabh Singh

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Interpretability Deep Learning > Optimization & Theory > Interpretability

Keywords

neural network interpretability constraint satisfaction relu activation neural network symbolic correction input correction

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018