TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages

Manikandan Ravikiran; Siddharth Vohra; Rajat Verma; Rohit Saluja; Arnav Bhavsar

2025 COLING COLING 2025

TEEMIL : Towards Educational MCQ Difficulty Estimation in Indic Languages

Abstract

AbstractDifficulty estimation of multiple-choice questions (MCQs) is crucial for creating effective educational assessments, yet remains underexplored in Indic languages like Hindi and Kannada due to the lack of comprehensive datasets. This paper addresses this gap by introducing two datasets, TEEMIL-H and TEEMIL-K, containing 4689 and 4215 MCQs, respectively, with manually annotated difficulty labels. We benchmark these datasets using state-of-the-art multilingual models and conduct ablation studies to analyze the effect of context, the impact of options, and the presence of the None of the Above (NOTA) option on difficulty estimation. Our findings establish baselines for difficulty estimation in Hindi and Kannada, offering valuable insights into improving model performance and guiding future research in MCQ difficulty estimation .

🧭 Keyword Pioneer — mcq difficulty

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Manikandan Ravikiran , Siddharth Vohra , Rajat Verma , Rohit Saluja , Arnav Bhavsar

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Regression

Keywords

multilingual model educational assessment difficulty estimation indic language mcq difficulty

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025