Cloze-driven Pretraining of Self-attention Networks

Alexei Baevski; Sergey Edunov; Yinhan Liu; Luke Zettlemoyer; Michael Auli

2019 IJCNLP IJCNLP 2019

Cloze-driven Pretraining of Self-attention Networks

Abstract

AbstractWe present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks, consistent with BERT. We also present a detailed analysis of a number of factors that contribute to effective pretraining, including data domain and size, model capacity, and variations on the cloze objective.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — language understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexei Baevski , Sergey Edunov , Yinhan Liu , Luke Zettlemoyer , Michael Auli

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining Deep Learning > Models > Transformers

Keywords

named entity recognition language understanding self-attention network cloze task bidirectional model bi-directional transformer

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019