2025 ACL ACL 2025

BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors

Abstract

AbstractWe present a prompt-based evaluation framework for assessing AI-generated math tutoring responses across four pedagogical dimensions: mistake identification, mistake location, guidance quality, and actionability. Our approach leverages task-aware prompt tuning on a large language model, supplemented by data augmentation techniques including dialogue shuffling and class-balanced downsampling. In experiments on the BEA 2025 Shared Task benchmark, our system achieved first place in mistake identification and strong top-five rankings in the other tracks. These results demonstrate the effectiveness of structured prompting and targeted augmentation for enhancing LLMs’ ability to provide pedagogically meaningful feedback.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
🧭 Keyword Pioneer — math tutoring