WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

Nachshon Cohen; Oren Kalinsky; Yftah Ziser; Alessandro Moschitti

2021 ACL ACL 2021

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

Abstract

AbstractRecent works made significant advances on summarization tasks, facilitated by summarization datasets. Several existing datasets have the form of coherent-paragraph summaries. However, these datasets were curated from academic documents that were written for experts, thus making the essential step of assessing the summarization output through human-evaluation very demanding. To overcome these limitations, we present a dataset based on article summaries appearing on the WikiHow website, composed of how-to articles and coherent-paragraph summaries written in plain language. We compare our dataset attributes to existing ones, including readability and world-knowledge, showing our dataset makes human evaluation significantly easier and thus, more effective. A human evaluation conducted on PubMed and the proposed dataset reinforces our findings.

🌉 Interdisciplinary Bridge — Computer Science and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nachshon Cohen , Oren Kalinsky , Yftah Ziser , Alessandro Moschitti

Topics

Computer Science > Applications > Document Analysis Natural Language Processing > Applications > Summarization

Keywords

natural language processing text summarization human evaluation

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021