2024
EMNLP
EMNLP 2024
Predicting generalization performance with correctness discriminators
Abstract
AbstractThe ability to predict an NLP model’s accuracy on unseen, potentially out-of-distribution data is a prerequisite for trustworthiness. We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We achieve this by training a *discriminator* which predicts whether the output of a given sequence-to-sequence model is correct or not. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds, and that these bounds are remarkably close together.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Classification
Machine Learning > Core Methods > Embedding Learning
Machine Learning > Optimization & Theory > Theory
Artificial Intelligence > Core AI > Large Language Models
Machine Learning > Optimization & Theory > Evaluation
Machine Learning > Learning Types > Evaluation
Natural Language Processing > Applications > Natural Language Understanding
Machine Learning > Learning Types > Generalization