End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models

Kamel Nebhi; Amrita Panesar; Hans Bantilan

2025 ACL ACL 2025

End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models

Abstract

AbstractAutomated item generation (AIG) is a key enabler for scaling language proficiency assessments. We present an end-to-end methodology for automated generation, annotation, and integration of adaptive writing items for the EF Standard English Test (EFSET), leveraging recent advances in large language models (LLMs). Our pipeline uses few-shot prompting with state-of-the-art LLMs to generate diverse, proficiency-aligned prompts, rigorously validated by expert reviewers. For robust scoring, we construct a synthetic response dataset via majority-vote LLM annotation and fine-tune a LLaMA 3.1 (8B) model. For each writing item, a range of proficiency-aligned synthetic responses, designed to emulate authentic student work, are produced for model training and evaluation. These results demonstrate substantial gains in scalability and validity, offering a replicable framework for next-generation adaptive language testing.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — adaptive assessment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio