WAX: A New Dataset for Word Association eXplanations

Chunhua Liu; Trevor Cohn; Simon De Deyne; Lea Frermann

2022 AACL AACL 2022

WAX: A New Dataset for Word Association eXplanations

Abstract

AbstractWord associations are among the most common paradigms to study the human mental lexicon. While their structure and types of associations have been well studied, surprisingly little attention has been given to the question of why participants produce the observed associations. Answering this question would not only advance understanding of human cognition, but could also aid machines in learning and representing basic commonsense knowledge. This paper introduces a large, crowd-sourced data set of English word associations with explanations, labeled with high-level relation types. We present an analysis of the provided explanations, and design several tasks to probe to what extent current pre-trained language models capture the underlying relations. Our experiments show that models struggle to capture the diversity of human associations, suggesting WAX is a rich benchmark for commonsense modeling and generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — human mental lexicon

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chunhua Liu , Trevor Cohn , Simon De Deyne , Lea Frermann

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Generation > Language Modeling Natural Language Processing > Resources & Methods > Lexical Semantics Interdisciplinary > Linguistics > Computational Linguistics

Keywords

commonsense knowledge semantic analysis language model pre-trained language model word association relation type human mental lexicon commonsense modeling

Download PDF

Related papers

A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging 2022

Enhancing Tabular Reasoning with Pattern Exploiting Training 2022

Re-contextualizing Fairness in NLP: The Case of India 2022

Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets 2022

Promoting Pre-trained LM with Linguistic Features on Automatic Readability Assessment 2022