2025 COLING COLING 2025

TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages

Abstract

AbstractDifficulty estimation of multiple-choice questions (MCQs) is crucial for creating effective educational assessments, yet remains underexplored in Indic languages like Hindi and Kannada due to the lack of comprehensive datasets. This paper addresses this gap by introducing two datasets, TEEMIL-H and TEEMIL-K, containing 4689 and 4215 MCQs, respectively, with manually annotated difficulty labels. We benchmark these datasets using state-of-the-art multilingual models and conduct ablation studies to analyze the effect of context, the impact of options, and the presence of the None of the Above (NOTA) option on difficulty estimation. Our findings establish baselines for difficulty estimation in Hindi and Kannada, offering valuable insights into improving model performance and guiding future research in MCQ difficulty estimation .

🧭 Keyword Pioneer — mcq difficulty
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio