Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Shane Storks; Joyce Chai

2021 EMNLP EMNLP 2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Abstract

AbstractAs large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines’ predictions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — prediction coherence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shane Storks , Joyce Chai

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Fairness Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Evaluation

Keywords

benchmark evaluation text classification model evaluation semantic analysis language understanding model interpretability prediction coherence

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021