An Investigation of Language Model Interpretability via Sentence Editing

Samuel Stevens; Yu Su

2021 EMNLP EMNLP 2021

An Investigation of Language Model Interpretability via Sentence Editing

Abstract

AbstractPre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, but interpreting their behavior still remains a significant challenge and many important questions remain largely unanswered. In this work, we re-purpose a sentence editing dataset, where faithful high-quality human rationales can be automatically extracted and compared with extracted model rationales, as a new testbed for interpretability. This enables us to conduct a systematic investigation on an array of questions regarding PLMs’ interpretability, including the role of pre-training procedure, comparison of rationale extraction methods, and different layers in the PLM. The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales and work better than gradient-based saliency in extracting model rationales. Both the dataset and code will be released to facilitate future interpretability research.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — language model interpretability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Samuel Stevens , Yu Su

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Transformers Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Representation Learning Natural Language Processing > Resources & Methods > Language Modeling Artificial Intelligence > Core AI > Language

Keywords

rationale extraction pre-trained language model language model interpretability attention weight gradient-based saliency sentence editing saliency method

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021