TestAug: A Framework for Augmenting Capability-based NLP Tests

Guanqun Yang; Mirazul Haque; Qiaochu Song; Wei Yang; Xueqing Liu

2022 COLING COLING 2022

TestAug: A Framework for Augmenting Capability-based NLP Tests

Abstract

AbstractThe recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures for models with good held-out evaluation scores. However, existing work on capability-based testing requires the developer to compose each individual test template from scratch. Such approach thus requires extensive manual efforts and is less scalable. In this paper, we investigate a different approach that requires the developer to only annotate a few test templates, while leveraging the GPT-3 engine to generate the majority of test cases. While our approach saves the manual efforts by design, it guarantees the correctness of the generated suites with a validity checker. Moreover, our experimental results show that the test suites generated by GPT-3 are more diverse than the manually created ones; they can also be used to detect more errors compared to manually created counterparts. Our test suites can be downloaded at https://anonymous-researcher-nlp.github.io/testaug/.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — capability-based testing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guanqun Yang , Mirazul Haque , Qiaochu Song , Wei Yang , Xueqing Liu

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Text Classification Computer Science > Applications > Software Engineering Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Evaluation

Keywords

natural language processing prompt engineering language model evaluation test case generation capability-based testing large language model test augmentation nlp testing

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022