BERT-IRT: Accelerating Item Piloting with BERT Embeddings and Explainable IRT Models

Kevin P. Yancey; Andrew Runge; Geoffrey Laflair; Phoebe Mulcaire

2024 NAACL NAACL 2024

BERT-IRT: Accelerating Item Piloting with BERT Embeddings and Explainable IRT Models

Abstract

AbstractEstimating item parameters (e.g., the difficulty of a question) is an important part of modern high-stakes tests. Conventional methods require lengthy pilots to collect response data from a representative population of test-takers. The need for these pilots limit item bank size and how often those item banks can be refreshed, impacting test security, while increasing costs needed to support the test and taking up the test-taker’s valuable time. Our paper presents a novel explanatory item response theory (IRT) model, BERT-IRT, that has been used on the Duolingo English Test (DET), a high-stakes test of English, to reduce the length of pilots by a factor of 10. Our evaluation shows how the model uses BERT embeddings and engineered NLP features to accelerate item piloting without sacrificing criterion validity or reliability.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — natural language processing feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kevin P. Yancey , Andrew Runge , Geoffrey Laflair , Phoebe Mulcaire

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Statistical Learning Deep Learning > Architectures > Transformers

Keywords

parameter estimation item response theory model reliability bidirectional encoder representations from transformer natural language processing feature

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024