Robustness Gym: Unifying the NLP Evaluation Landscape

Karan Goel; Nazneen Fatema Rajani; Jesse Vig; Zachary Taschdjian; Mohit Bansal; Christopher Re

2021 NAACL NAACL 2021

Robustness Gym: Unifying the NLP Evaluation Landscape

Abstract

AbstractDespite impressive performance on standard benchmarks, natural language processing (NLP) models are often brittle when deployed in real-world systems. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG), a simple and extensible evaluation toolkit that unifies 4 standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks. By providing a common platform for evaluation, RG enables practitioners to compare results from disparate evaluation paradigms with a single click, and to easily develop and share novel evaluation methods using a built-in set of abstractions. RG is under active development and we welcome feedback & contributions from the community.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Karan Goel , Nazneen Fatema Rajani , Jesse Vig , Zachary Taschdjian , Mohit Bansal , Christopher Re

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Domain Generalization Machine Learning > Core Methods > Evaluation

Keywords

model robustness adversarial attack nlp evaluation evaluation toolkit

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021