Pitfalls in the Evaluation of Sentence Embeddings

Steffen Eger; Andreas Rücklé; Iryna Gurevych

2019 ACL ACL 2019

Pitfalls in the Evaluation of Sentence Embeddings

Abstract

AbstractDeep learning models continuously break new records across different NLP tasks. At the same time, their success exposes weaknesses of model evaluation. Here, we compile several key pitfalls of evaluation of sentence embeddings, a currently very popular NLP paradigm. These pitfalls include the comparison of embeddings of different sizes, normalization of embeddings, and the low (and diverging) correlations between transfer and probing tasks. Our motivation is to challenge the current evaluation of sentence embeddings and to provide an easy-to-access reference for future research. Based on our insights, we also recommend better practices for better future evaluations of sentence embeddings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — transfer task

🐣 Hot Topic Early Bird — model evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Steffen Eger , Andreas Rücklé , Iryna Gurevych

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Theory Artificial Intelligence > Core AI > Language Deep Learning > Optimization & Theory > Evaluation Natural Language Processing > Understanding > Semantics

Keywords

model evaluation evaluation methodology sentence embedding probing task transfer task

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019