TuringAdvice: A Generative and Dynamic Evaluation of Language Use

Rowan Zellers; Ari Holtzman; Elizabeth Clark; LIANHUI Qin; Ali Farhadi; Yejin Choi

2021 NAACL NAACL 2021

TuringAdvice: A Generative and Dynamic Evaluation of Language Use

Abstract

AbstractWe propose TuringAdvice, a new challenge task and dataset for language understanding models. Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language. Our evaluation framework tests a fundamental aspect of human language understanding: our ability to use language to resolve open-ended situations by communicating with each other. Empirical results show that today’s models struggle at TuringAdvice, even multibillion parameter models finetuned on 600k in-domain training examples. The best model, T5, writes advice that is at least as helpful as human-written advice in only 14% of cases; a much larger non-finetunable GPT3 model does even worse at 4%. This low performance reveals language understanding errors that are hard to spot outside of a generative setting, showing much room for progress.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — helpful advice

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rowan Zellers , Ari Holtzman , Elizabeth Clark , LIANHUI Qin , Ali Farhadi , Yejin Choi

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Question Answering

Keywords

few-shot learning natural language generation generative evaluation language understanding helpful advice open-ended situation

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021