Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets

Chuanrong Li; Lin Shengshuo; Zeyu Liu; Xinyi Wu; Xuhui Zhou; Shane Steinert-Threlkeld

2020 EMNLP EMNLP 2020

Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets

Abstract

AbstractAlthough large-scale pretrained language models, such as BERT and RoBERTa, have achieved superhuman performance on in-distribution test sets, their performance suffers on out-of-distribution test sets (e.g., on contrast sets). Building contrast sets often requires human-expert annotation, which is expensive and hard to create on a large scale. In this work, we propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets, which enables practitioners to explore linguistic phenomena of interests as well as compose different phenomena. Experimenting with our method on SNLI and MNLI shows that current pretrained language models, although being claimed to contain sufficient linguistic knowledge, struggle on our automatically generated contrast sets. Furthermore, we improve models’ performance on the contrast sets by applying LIT to augment the training data, without affecting performance on the original data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — linguistic transformation

🐣 Hot Topic Early Bird — out-of-distribution generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chuanrong Li , Lin Shengshuo , Zeyu Liu , Xinyi Wu , Xuhui Zhou , Shane Steinert-Threlkeld

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Evaluation Machine Learning > Learning Types > Data Augmentation Machine Learning > Learning Types > Distribution Shift

Keywords

data augmentation out-of-distribution generalization pretrained language model contrast set linguistic transformation

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020