2020
EMNLP
EMNLP 2020
A Method for Building a Commonsense Inference Dataset based on Basic Events
Abstract
AbstractWe present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🐣
Hot Topic Early Bird
— dataset construction
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Natural Language Processing > Understanding > Semantic Analysis
Natural Language Processing > Resources & Methods > Natural Language Inference
Machine Learning > Learning Paradigms > Transfer Learning
Natural Language Processing > Applications > Natural Language Understanding
Machine Learning > Application Areas > Text Classification