Using LLMs to simulate students’ responses to exam questions

Luca Benedetto; Giovanni Aradelli; Antonia Donvito; Alberto Lucchetti; Andrea Cappelli; Paula Buttery

2024 EMNLP EMNLP 2024

Using LLMs to simulate students’ responses to exam questions

Abstract

AbstractPrevious research leveraged Large Language Models (LLMs) in numerous ways in the educational domain. Here, we show that they can be used to answer exam questions simulating students of different skill levels and share a prompt, engineered for GPT-3.5, that enables the simulation of varying student skill levels on questions from different educational domains. We evaluate the proposed prompt on three publicly available datasets (one from science exams and two from English reading comprehension exams) and three LLMs (two versions of GPT-3.5 and one of GPT-4), and show that it is robust to different educational domains and capable of generalising to data unseen during the prompt engineering phase. We also show that, being engineered for a specific version of GPT-3.5, the prompt does not generalise well to different LLMs, stressing the need for prompt engineering for each model in practical applications. Lastly, we find that there is not a direct correlation between the quality of the rationales obtained with chain-of-thought prompting and the accuracy in the student simulation task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — exam question

🐣 Hot Topic Early Bird — educational assessment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Luca Benedetto , Giovanni Aradelli , Antonia Donvito , Alberto Lucchetti , Andrea Cappelli , Paula Buttery

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Large Language Models Interdisciplinary > Education Machine Learning > Learning Types > Prompt Engineering

Keywords

prompt engineering chain-of-thought prompting educational assessment student simulation large language model exam question

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024