2020
ACL
ACL 2020
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
Abstract
AbstractThe lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— dialog evaluation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Unsupervised Learning
Interdisciplinary > Linguistics > Computational Linguistics
Natural Language Processing > Resources & Methods > Language Modeling
Machine Learning > Learning Paradigms > Unsupervised Learning
Artificial Intelligence > Core AI > Dialogue Systems
Natural Language Processing > Applications > Natural Language Generation