The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization

Simeng Sun; Ani Nenkova

2019 EMNLP EMNLP 2019

The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization

Abstract

AbstractROUGE is widely used to automatically evaluate summarization systems. However, ROUGE measures semantic overlap between a system summary and a human reference on word-string level, much at odds with the contemporary treatment of semantic meaning. Here we present a suite of experiments on using distributed representations for evaluating summarizers, both in reference-based and in reference-free setting. Our experimental results show that the max value over each dimension of the summary ELMo word embeddings is a good representation that results in high correlation with human ratings. Averaging the cosine similarity of all encoders we tested yields high correlation with manual scores in reference-free setting. The distributed representations outperform ROUGE in recent corpora for abstractive news summarization but are less good on test data used in past evaluations.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — reference-free evaluation

🐣 Hot Topic Early Bird — summarization evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Simeng Sun , Ani Nenkova

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Generation > Summarization Natural Language Processing > Resources & Methods > Text Representation Natural Language Processing > Applications > Summarization Machine Learning > Core Methods > Evaluation Deep Learning > Optimization & Theory > Evaluation

Keywords

summarization evaluation embedding similarity text embedding distributed representation reference-free evaluation cosine similarity rouge metric elmo embedding reference-based evaluation

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019