Text Summarization with Pretrained Encoders

Yang Liu; Mirella Lapata

2019 IJCNLP IJCNLP 2019

Text Summarization with Pretrained Encoders

Abstract

AbstractBidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings.

🧭 Keyword Pioneer — document-level encoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Liu , Mirella Lapata

Topics

Natural Language Processing > Generation > Summarization Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Text Representation

Keywords

transformer architecture extractive summarization text summarization abstractive summarization pretrained encoder document-level encoder

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019