Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

Jianbo Chen; Le Song; Martin Wainwright; Michael Jordan

2018 ICML ICML 2018

Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

Abstract

We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show the effectiveness of our method on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — model interpretation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Jianbo Chen , Le Song , Martin Wainwright , Michael Jordan

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Representation Learning

Keywords

feature selection mutual information variational approximation model interpretation information-theoretic perspective instancewise explanation

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018