SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Daniel Deutsch; Dan Roth

2020 EMNLP EMNLP 2020

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

Abstract

AbstractWe present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.

🌉 Interdisciplinary Bridge — Computer Science and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — open-source library

🐣 Hot Topic Early Bird — summarization evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Daniel Deutsch , Dan Roth

Topics

Computer Science > Applications > Software Engineering Interdisciplinary > Linguistics > Computational Linguistics Natural Language Processing > Applications > Summarization Machine Learning > Learning Types > Evaluation

Keywords

natural language processing text summarization summarization evaluation human judgment evaluation metric evaluation metrics metric correlation rouge metric rouge score open-source library human-annotated judgment

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020