Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study

John P. Lalor; Hao Wu; Tsendsuren Munkhdalai; Hong Yu

2018 EMNLP EMNLP 2018

Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study

Abstract

AbstractInterpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. In this work we examine the impact of a test set question’s difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question’s difficulty. In addition, as DNNs are trained on larger datasets easy questions start to have a higher probability of being answered correctly than harder questions.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Evaluation

🧭 Keyword Pioneer — psychometric analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

John P. Lalor , Hao Wu , Tsendsuren Munkhdalai , Hong Yu

Topics

Machine Learning > Optimization & Theory > Theory Natural Language Processing > Understanding > Sentiment Analysis Natural Language Processing > Resources & Methods > Natural Language Inference Natural Language Processing > Applications > Sentiment Analysis Natural Language Processing > Applications > Natural Language Inference Deep Learning > Optimization & Theory > Evaluation

Keywords

sentiment analysis natural language inference deep learning psychometric analysis test set difficulty psychometric method deep learning evaluation

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018