LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

Yu Li; Josh Arnold; Feifan Yan; Weiyan Shi; Zhou Yu

2021 ACL ACL 2021

LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

Abstract

AbstractWe present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

📈 Trend Setter — Evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yu Li , Josh Arnold , Feifan Yan , Weiyan Shi , Zhou Yu

Topics

Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Applications > Text Classification Natural Language Processing > Applications > Dialogue Systems Artificial Intelligence > Core AI > Dialogue Systems Deep Learning > Learning Types > Evaluation

Keywords

natural language processing human evaluation dialogue system open-source toolkit evaluation toolkit

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021