Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

Ryo Nagata; Hiroya Takamura; Naoki Otani; Yoshifumi Kawasaki

2023 EMNLP EMNLP 2023

Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

Abstract

AbstractIn this paper, we propose methods for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Natural Language Processing

🧭 Keyword Pioneer — vector variance

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ryo Nagata , Hiroya Takamura , Naoki Otani , Yoshifumi Kawasaki

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Understanding > Semantic Analysis Mathematics & Optimization > Mathematics > Statistics

Keywords

semantic shift word embedding variance analysis semantic change detection word vector corpus analysis semantic difference vector variance corpus comparison

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023