Reliability Testing for Natural Language Processing Systems

Samson Tan; Shafiq Joty; Kathy Baxter; Araz Taeihagh; Gregory A. Bennett; Min-Yen Kan

2021 ACL ACL 2021

Reliability Testing for Natural Language Processing Systems

Abstract

AbstractQuestions of fairness, robustness, and transparency are paramount to address before deploying NLP systems. Central to these concerns is the question of reliability: Can NLP systems reliably treat different demographics fairly and function correctly in diverse and noisy environments? To address this, we argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests. We argue that reliability testing — with an emphasis on interdisciplinary collaboration — will enable rigorous and targeted testing, and aid in the enactment and enforcement of industry standards.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — reliability testing

🐣 Hot Topic Early Bird — ai safety

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Samson Tan , Shafiq Joty , Kathy Baxter , Araz Taeihagh , Gregory A. Bennett , Min-Yen Kan

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > Robustness Artificial Intelligence > Core AI > Robustness

Keywords

natural language processing ai safety adversarial attack reliability testing demographic fairness fairness evaluation robustness testing

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021