WAX: A New Dataset for Word Association eXplanations

Chunhua Liu; Trevor Cohn; Simon De Deyne; Lea Frermann

2022 IJCNLP IJCNLP 2022

WAX: A New Dataset for Word Association eXplanations

Abstract

AbstractWord associations are among the most common paradigms to study the human mental lexicon. While their structure and types of associations have been well studied, surprisingly little attention has been given to the question of why participants produce the observed associations. Answering this question would not only advance understanding of human cognition, but could also aid machines in learning and representing basic commonsense knowledge. This paper introduces a large, crowd-sourced data set of English word associations with explanations, labeled with high-level relation types. We present an analysis of the provided explanations, and design several tasks to probe to what extent current pre-trained language models capture the underlying relations. Our experiments show that models struggle to capture the diversity of human associations, suggesting WAX is a rich benchmark for commonsense modeling and generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Natural Language Processing

🐣 Hot Topic Early Bird — cognitive modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chunhua Liu , Trevor Cohn , Simon De Deyne , Lea Frermann

Topics

Interdisciplinary > Cognitive Science > Cognitive Modeling Natural Language Processing > Resources & Methods > Language Modeling Artificial Intelligence > Core AI > Knowledge Representation

Keywords

cognitive modeling commonsense knowledge language model pre-trained language model relation classification word association mental lexicon crowdsourced dataset

Download PDF

Related papers

Chasing the Tail with Domain Generalization: A Case Study on Frequency-Enriched Datasets 2022

Double Trouble: How to not Explain a Text Classifier’s Decisions Using Counterfactuals Synthesized by Masked Language Models? 2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning 2022

Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph 2022

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 2022