UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao; Li Dong; Furu Wei; Wenhui Wang; Nan Yang; Xiaodong Liu; Yu Wang; Jianfeng Gao; Songhao Piao; Ming Zhou; Hsiao-Wuen Hon

2020 ICML ICML 2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Abstract

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of language understanding and generation tasks across several widely used benchmarks. The code and pre-trained models are available at https://github.com/microsoft/unilm.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — autoregressive modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Jianfeng Gao , Songhao Piao , Ming Zhou , Hsiao-Wuen Hon

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining Natural Language Processing > Generation > Language Modeling

Keywords

bidirectional encoder autoregressive modeling pseudo-masked language model unified language model

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020