On the Limits of Minimal Pairs in Contrastive Evaluation

Jannis Vamvas; Rico Sennrich

2021 EMNLP EMNLP 2021

On the Limits of Minimal Pairs in Contrastive Evaluation

Abstract

AbstractMinimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English–German MT that implements this recommendation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — distributional discrepancy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jannis Vamvas , Rico Sennrich

Topics

Machine Learning > Optimization & Theory > Theory Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Application Areas > Model Compression Artificial Intelligence > Core AI > Natural Language Processing Deep Learning > Optimization & Theory > Evaluation

Keywords

machine translation language model distributional discrepancy minimal pair contrastive evaluation deployment time deployment-time decoding

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021