MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Wei-Ling Hsu; Yu-Chien Tang; An-Zi Yen

2026 EACL EACL 2026

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Abstract

AbstractThe increasing reliance on Large Language Models (LLMs) across various domains extends to education, where students progressively use generative AI as a tool for learning. While prior work has examined LLMs’ mathematical ability, their reliability in grading authentic student problem-solving processes and delivering effective feedback remains underexplored. This study introduces MathEDU, a dataset consisting of student problem-solving processes in mathematics and corresponding teacher-written feedback. We systematically evaluate the reliability of various models across three hierarchical tasks: answer correctness classification, error identification, and feedback generation. Experimental results show that fine-tuning strategies effectively improve performance in classifying correctness and locating erroneous steps. However, the generated feedback across models shows a considerable gap from teacher-written feedback. Critically, the generated feedback is often verbose and fails to provide targeted explanations for the student’s underlying misconceptions. This emphasizes the urgent need for trustworthy and pedagogy-aware AI feedback in education.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Wei-Ling Hsu , Yu-Chien Tang , An-Zi Yen

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Natural Language Processing > Generation > Text Generation Natural Language Processing > Resources & Methods > Large Language Models

Keywords

mathematical problem solving error identification educational feedback

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026