Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Nicole Peinelt; Maria Liakata; Dong Nguyen

2019 ACL ACL 2019

Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Abstract

AbstractExisting datasets for scoring text pairs in terms of semantic similarity contain instances whose resolution differs according to the degree of difficulty. This paper proposes to distinguish obvious from non-obvious text pairs based on superficial lexical overlap and ground-truth labels. We characterise existing datasets in terms of containing difficult cases and find that recently proposed models struggle to capture the non-obvious cases of semantic similarity. We describe metrics that emphasise cases of similarity which require more complex inference and propose that these are used for evaluating systems for semantic similarity.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — complex inference

🐣 Hot Topic Early Bird — evaluation metrics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Nicole Peinelt , Maria Liakata , Dong Nguyen

Topics

Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Natural Language Inference Natural Language Processing > Applications > Natural Language Inference Machine Learning > Learning Types > Evaluation

Keywords

natural language inference semantic similarity dataset evaluation evaluation metrics dataset difficulty text pair classification lexical overlap text pair complex inference

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019