Learning to Explain: Generating Stable Explanations Fast

Xuelin Situ; Ingrid Zukerman; Cecile Paris; Sameen Maruf; Gholamreza Haffari

2021 ACL ACL 2021

Learning to Explain: Generating Stable Explanations Fast

Abstract

AbstractThe importance of explaining the outcome of a machine learning model, especially a black-box model, is widely acknowledged. Recent approaches explain an outcome by identifying the contributions of input features to this outcome. In environments involving large black-box models or complex inputs, this leads to computationally demanding algorithms. Further, these algorithms often suffer from low stability, with explanations varying significantly across similar examples. In this paper, we propose a Learning to Explain (L2E) approach that learns the behaviour of an underlying explanation algorithm simultaneously from all training examples. Once the explanation algorithm is distilled into an explainer network, it can be used to explain new instances. Our experiments on three classification tasks, which compare our approach to six explanation algorithms, show that L2E is between 5 and 7.5×10ˆ4 times faster than these algorithms, while generating more stable explanations, and having comparable faithfulness to the black-box model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xuelin Situ , Ingrid Zukerman , Cecile Paris , Sameen Maruf , Gholamreza Haffari

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Domain Adaptation Deep Learning > Learning Types > Knowledge Distillation Machine Learning > Core Methods > Interpretability

Keywords

knowledge distillation explainable ai feature attribution model interpretability model explanation explanation stability

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021