Unsupervised Single Document Abstractive Summarization using Semantic Units

Jhen-Yi Wu; Ying-Jia Lin; Hung-Yu Kao

2022 IJCNLP IJCNLP 2022

Unsupervised Single Document Abstractive Summarization using Semantic Units

Abstract

AbstractIn this work, we study the importance of content frequency on abstractive summarization, where we define the content as “semantic units.” We propose a two-stage training framework to let the model automatically learn the frequency of each semantic unit in the source text. Our model is trained in an unsupervised manner since the frequency information can be inferred from source text only. During inference, our model identifies sentences with high-frequency semantic units and utilizes frequency information to generate summaries from the filtered sentences. Our model performance on the CNN/Daily Mail summarization task outperforms the other unsupervised methods under the same settings. Furthermore, we achieve competitive ROUGE scores with far fewer model parameters compared to several large-scale pre-trained models. Our model can be trained under low-resource language settings and thus can serve as a potential solution for real-world applications where pre-trained models are not applicable.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — content frequency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jhen-Yi Wu , Ying-Jia Lin , Hung-Yu Kao

Topics

Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Generation > Summarization Deep Learning > Learning Types > Self-Supervised Learning

Keywords

unsupervised learning document summarization text summarization low-resource setting abstractive summarization semantic unit content frequency

Download PDF

Related papers

Chasing the Tail with Domain Generalization: A Case Study on Frequency-Enriched Datasets 2022

Double Trouble: How to not Explain a Text Classifier’s Decisions Using Counterfactuals Synthesized by Masked Language Models? 2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning 2022

Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph 2022

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 2022