SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

Dheeraj Rajagopal; Vidhisha Balachandran; Eduard H Hovy; Yulia Tsvetkov

2021 EMNLP EMNLP 2021

SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

Abstract

AbstractWe introduce SelfExplain, a novel self-explaining model that explains a text classifier’s predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — phrase-based concept

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dheeraj Rajagopal , Vidhisha Balachandran , Eduard H Hovy , Yulia Tsvetkov

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Text Classification Machine Learning > Core Methods > Interpretability

Keywords

text classification relevance scoring neural text classifier relevance score neural network self-explaining model concept attribution phrase-based concept

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021