Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Laura Burdick; Jonathan K. Kummerfeld; Rada Mihalcea

2021 EMNLP EMNLP 2021

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Abstract

AbstractWord embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Laura Burdick , Jonathan K. Kummerfeld , Rada Mihalcea

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Resources & Methods > Text Representation Interdisciplinary > Linguistics Deep Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Natural Language Processing

Keywords

embedding space nearest neighbor word embedding representation stability linguistic feature cross-lingual analysis embedding stability

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021