Regularizing Black-box Models for Improved Interpretability

Gregory Plumb; Maruan Al-Shedivat; Ángel Alexander Cabrera; Adam Perer; Eric P. Xing; Ameet Talwalkar

2020 NIPS NeurIPS 2020

Regularizing Black-box Models for Improved Interpretability

Abstract

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — explanation fidelity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Gregory Plumb , Maruan Al-Shedivat , Ángel Alexander Cabrera , Adam Perer , Eric P. Xing , Ameet Talwalkar

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Regularization Machine Learning > Core Methods > Interpretability

Keywords

interpretable machine learning post-hoc explanation model regularization explanation fidelity explanation stability

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020