Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction

Tiwalayo Eisape; Noga Zaslavsky; Roger Levy

2020 CONLL CoNLL 2020

Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction

Abstract

AbstractContemporary autoregressive language models (LMs) trained purely on corpus data have been shown to capture numerous features of human incremental processing. However, past work has also suggested dissociations between corpus probabilities and human next-word predictions. Here we evaluate several state-of-the-art language models for their match to human next-word predictions and to reading time behavior from eye movements. We then propose a novel method for distilling the linguistic information implicit in human linguistic predictions into pre-trained LMs: Cloze Distillation. We apply this method to a baseline neural LM and show potential improvement in reading time prediction and generalization to held-out human cloze data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — autoregressive language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Tiwalayo Eisape , Noga Zaslavsky , Roger Levy

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Machine Learning > Application Areas > Knowledge Distillation Deep Learning > Models > Generative Models Natural Language Processing > Resources & Methods > Language Modeling Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Techniques > Knowledge Distillation

Keywords

knowledge distillation neural language model next-word prediction autoregressive language model reading time cloze task cloze prediction human prediction

Download PDF

Related papers

Recurrent babbling: evaluating the acquisition of grammar from limited input data 2020

Finding The Right One and Resolving it 2020

Enriching Word Embeddings with Temporal and Spatial Information 2020

Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension 2020

Bridging Information-Seeking Human Gaze and Machine Reading Comprehension 2020