A Rank-Based Similarity Metric for Word Embeddings

Enrico Santus; Hongmin Wang; Emmanuele Chersoni; Yue Zhang

2018 ACL ACL 2018

A Rank-Based Similarity Metric for Word Embeddings

Abstract

AbstractWord Embeddings have recently imposed themselves as a standard for representing word meaning in NLP. Semantic similarity between word pairs has become the most common evaluation benchmark for these representations, with vector cosine being typically used as the only similarity metric. In this paper, we report experiments with a rank-based metric for WE, which performs comparably to vector cosine in similarity estimation and outperforms it in the recently-introduced and challenging task of outlier detection, thus suggesting that rank-based measures can improve clustering quality.

🧭 Keyword Pioneer — rank-based metric

🐣 Hot Topic Early Bird — word embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

Authors

Enrico Santus , Hongmin Wang , Emmanuele Chersoni , Yue Zhang

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Metric Learning Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Resources & Methods > Text Representation

Keywords

outlier detection clustering quality semantic similarity rank-based metric word embedding vector cosine

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018