Towards Human-aligned Evaluation for Linear Programming Word Problems

Linzi Xing; Xinglu Wang; Yuxi Feng; Zhenan Fan; Jing Xiong; Zhijiang Guo; Xiaojin Fu; Rindra Ramamonjison; Mahdi Mostajabdaveh; Xiongwei Han; Zirui Zhou; Yong Zhang

2024 COLING COLING 2024

Towards Human-aligned Evaluation for Linear Programming Word Problems

Abstract

AbstractMath Word Problem (MWP) is a crucial NLP task aimed at providing solutions for given mathematical descriptions. A notable sub-category of MWP is the Linear Programming Word Problem (LPWP), which holds significant relevance in real-world decision-making and operations research. While the recent rise of generative large language models (LLMs) has brought more advanced solutions to LPWPs, existing evaluation methodologies for this task still diverge from human judgment and face challenges in recognizing mathematically equivalent answers. In this paper, we introduce a novel evaluation metric rooted in graph edit distance, featuring benefits such as permutation invariance and more accurate program equivalence identification. Human evaluations empirically validate the superior efficacy of our proposed metric when particularly assessing LLM-based solutions for LPWP.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — linear programming word problem

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Linzi Xing , Xinglu Wang , Yuxi Feng , Zhenan Fan , Jing Xiong , Zhijiang Guo , Xiaojin Fu , Rindra Ramamonjison , Mahdi Mostajabdaveh , Xiongwei Han , Zirui Zhou , Yong Zhang

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Question Answering

Keywords

evaluation metric graph edit distance math word problem linear programming word problem program equivalence

Download PDF

Zero-shot Cross-lingual Automated Essay Scoring 2024

A Challenge Dataset and Effective Models for Conversational Stance Detection 2024

A Computational Model of Latvian Morphology 2024

A Frustratingly Simple Decoding Method for Neural Text Generation 2024

Towards Human-aligned Evaluation for Linear Programming Word Problems

Abstract

Authors

Topics

Keywords

Related papers