VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions

Thu Phuong Nguyen; Duc M. Nguyen; Hyotaek Jeon; Hyunwook Lee; Hyunmin Song; Sungahn Ko; Taehwan Kim

2025 EMNLP EMNLP 2025

VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions

Abstract

AbstractAutomatically assessing handwritten mathematical solutions is an important problem in educational technology with practical applications, but remains a significant challenge due to the diverse formats, unstructured layouts, and symbolic complexity of student work. To address this challenge, we introduce VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics Expressions—designed to assess open-form handwritten math responses with high accuracy and interpretable reasoning traces. VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives, including correctness, reasoning depth, and error localization. To enhance spatial understanding, we propose an Expression-Aware Visual Prompting Module, trained on our synthesized multi-line math expressions dataset to robustly guide attention in visually heterogeneous inputs. Evaluated on AIHub and FERMAT datasets, VEHME achieves state-of-the-art performance among open-source models and approaches the accuracy of proprietary systems, demonstrating its potential as a scalable and accessible tool for automated math assessment. Our training and experiment code is publicly available at our GitHub repository.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Reinforcement Learning

🧭 Keyword Pioneer — handwritten mathematics assessment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thu Phuong Nguyen , Duc M. Nguyen , Hyotaek Jeon , Hyunwook Lee , Hyunmin Song , Sungahn Ko , Taehwan Kim

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Multimodal Learning Reinforcement Learning > Methods > Deep RL Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Models > Vision-Language Models

Keywords

reinforcement learning vision-language model handwritten mathematics assessment reinforcement learning alignment visual prompting module multi-dimensional grading handwritten recognition mathematical expression evaluation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025