The limits of automatic summarisation according to ROUGE

Natalie Schluter

2017 EACL EACL 2017

The limits of automatic summarisation according to ROUGE

Abstract

AbstractThis paper discusses some central caveats of summarisation, incurred in the use of the ROUGE metric for evaluation, with respect to optimal solutions. The task is NP-hard, of which we give the first proof. Still, as we show empirically for three central benchmark datasets for the task, greedy algorithms empirically seem to perform optimally according to the metric. Additionally, overall quality assurance is problematic: there is no natural upper bound on the quality of summarisation systems, and even humans are excluded from performing optimal summarisation.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Natural Language Processing

🧭 Keyword Pioneer — evaluation benchmark

🐣 Hot Topic Early Bird — text summarization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Natalie Schluter

Topics

Machine Learning > Optimization & Theory > Theory Natural Language Processing > Generation > Summarization Mathematics & Optimization > Optimization > Combinatorial Optimization Mathematics & Optimization > Optimization > Discrete Optimization Natural Language Processing > Applications > Summarization

Keywords

text summarization computational complexity greedy algorithm evaluation benchmark evaluation metric np-hard problem automatic summarization rouge metric

Download PDF

Related papers

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages 2017

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension 2017

Is this a Child, a Girl or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings 2017

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit 2017

Assessing Convincingness of Arguments in Online Debates with Limited Number of Features 2017