A Method for Building a Commonsense Inference Dataset based on Basic Events

Kazumasa Omura; Daisuke Kawahara; Sadao Kurohashi

2020 EMNLP EMNLP 2020

A Method for Building a Commonsense Inference Dataset based on Basic Events

Abstract

AbstractWe present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — dataset construction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kazumasa Omura , Daisuke Kawahara , Sadao Kurohashi

Topics

Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Natural Language Inference Machine Learning > Learning Paradigms > Transfer Learning Natural Language Processing > Applications > Natural Language Understanding Machine Learning > Application Areas > Text Classification

Keywords

transfer learning question answering natural language understanding event relation multiple choice question dataset construction commonsense inference

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020