What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

Yu-An Wang; Yun-Nung Chen

2020 EMNLP EMNLP 2020

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

Abstract

AbstractIn recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in Transformers however is often discussed at will. Hence, we carry out an empirical study on position embedding of mainstream pre-trained Transformers mainly focusing on two questions: 1) Do position embeddings really learn the meaning of positions? 2) How do these different learned position embeddings affect Transformers for NLP tasks? This paper focuses on providing a new insight of pre-trained position embeddings by feature-level analysis and empirical experiments on most of iconic NLP tasks. It is believed that our experimental results can guide the future works to choose the suitable positional encoding function for specific tasks given the application property.

❓ The Questioner

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — feature-level analysis

🐣 Hot Topic Early Bird — positional encoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yu-An Wang , Yun-Nung Chen

Topics

Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining Natural Language Processing > Resources & Methods > Language Modeling Deep Learning > Models > Transformers Deep Learning > Optimization & Theory > Evaluation

Keywords

transformer architecture self-attention mechanism empirical study language model positional encoding position embedding pre-trained transformer feature-level analysis

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020