Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

Zeyu Yun; Yubei Chen; Bruno Olshausen; Yann LeCun

2021 NAACL NAACL 2021

Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

Abstract

AbstractTransformer networks have revolutionized NLP representation learning since they were introduced. Though a great effort has been made to explain the representation in transformers, it is widely recognized that our understanding is not sufficient. One important reason is that there lack enough visualization tools for detailed analysis. In this paper, we propose to use dictionary learning to open up these ‘black boxes’ as linear superpositions of transformer factors. Through visualization, we demonstrate the hierarchical semantic structures captured by the transformer factors, e.g., word-level polysemy disambiguation, sentence-level pattern formation, and long-range dependency. While some of these patterns confirm the conventional prior linguistic knowledge, the rest are relatively unexpected, which may provide new insights. We hope this visualization tool can bring further knowledge and a better understanding of how transformer networks work. The code is available at: https://github.com/zeyuyun1/TransformerVis.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🧭 Keyword Pioneer — transformer visualization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Zeyu Yun , Yubei Chen , Bruno Olshausen , Yann LeCun

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Architectures > Transformers

Keywords

dictionary learning semantic structure contextualized embedding transformer visualization linear superposition

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021