DeltaScore: Fine-Grained Story Evaluation with Perturbations

Zhuohan Xie; Miao Li; Trevor Cohn; Jey Lau

2023 EMNLP EMNLP 2023

DeltaScore: Fine-Grained Story Evaluation with Perturbations

Abstract

AbstractNumerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness. In this paper, we introduce DeltaScore, a novel methodology that uses perturbation techniques for the evaluation of nuanced story aspects. We posit that the extent to which a story excels in a specific aspect (e.g., fluency) correlates with the magnitude of its susceptibility to particular perturbations (e.g., the introduction of typos). Given this, we measure the quality of an aspect by calculating the likelihood difference between pre- and post-perturbation states using pre-trained language models. We compare DeltaScore with existing metrics on storytelling datasets from two domains in five fine-grained story aspects: fluency, coherence, relatedness, logicality, and interestingness. DeltaScore demonstrates strong performance, revealing a surprising finding that one specific perturbation proves highly effective in capturing multiple aspects. Source code is available on our GitHub repository.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhuohan Xie , Miao Li , Trevor Cohn , Jey Lau

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Generation > Text Generation Machine Learning > Learning Types > Supervised Learning Natural Language Processing > Applications > Text Generation Machine Learning > Learning Types > Evaluation Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Representation Learning Machine Learning > Core Methods > Evaluation

Keywords

natural language generation perturbation analysis language model pre-trained language model text generation evaluation fine-grained evaluation story evaluation text quality assessment perturbation technique

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023