Unsupervised Abstractive Summarization of Bengali Text Documents

Radia Rayan Chowdhury; Mir Tafseer Nayeem; Tahsin Tasnim Mim; Md. Saifur Rahman Chowdhury; Taufiqul Jannat

2021 EACL EACL 2021

Unsupervised Abstractive Summarization of Bengali Text Documents

Abstract

AbstractAbstractive summarization systems generally rely on large collections of document-summary pairs. However, the performance of abstractive systems remains a challenge due to the unavailability of the parallel data for low-resource languages like Bengali. To overcome this problem, we propose a graph-based unsupervised abstractive summarization system in the single-document setting for Bengali text documents, which requires only a Part-Of-Speech (POS) tagger and a pre-trained language model trained on Bengali texts. We also provide a human-annotated dataset with document-summary pairs to evaluate our abstractive model and to support the comparison of future abstractive summarization systems of the Bengali Language. We conduct experiments on this dataset and compare our system with several well-established unsupervised extractive summarization systems. Our unsupervised abstractive summarization model outperforms the baselines without being exposed to any human-annotated reference summaries.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Radia Rayan Chowdhury , Mir Tafseer Nayeem , Tahsin Tasnim Mim , Md. Saifur Rahman Chowdhury , Taufiqul Jannat

Topics

Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Generation > Summarization Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

unsupervised learning low-resource language pre-trained language model graph-based method abstractive summarization

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021