Large-scale Cloze Test Dataset Created by Teachers

Qizhe Xie; Guokun Lai; Zihang Dai; Eduard Hovy

2018 EMNLP EMNLP 2018

Large-scale Cloze Test Dataset Created by Teachers

Abstract

AbstractCloze tests are widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.

🌱 Topic Pioneer — Language Modeling

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — deep language understanding

🐣 Hot Topic Early Bird — language understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qizhe Xie , Guokun Lai , Zihang Dai , Eduard Hovy

Topics

Natural Language Processing > Generation > Language Modeling Natural Language Processing > Applications > Machine Reading Comprehension Natural Language Processing > Resources & Methods > Language Modeling Machine Learning > Learning Types > Language Modeling

Keywords

question answering language modeling reading comprehension language understanding language model cloze test deep language understanding

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018