We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Runqi Qiao; Qiuna Tan; Guanting Dong; MinhuiWu MinhuiWu; Chong Sun; Xiaoshuai Song; Jiapeng Wang; Zhuoma GongQue; Shanglin Lei; Yifan Zhang; Zhe Wei; Miaoxuan Zhang; Runfeng Qiao; Xiao Zong; Yida Xu; Peiqing Yang; Zhimin Bao; Muxi Diao; Chen Li; Honggang Zhang

2025 ACL ACL 2025

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Abstract

AbstractVisual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks mainly focus more on the end-to-end performance, but neglect the underlying principles of knowledge acquisition and generalization. Instead, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles. We meticulously collect 6.5K visual math problems and decompose them into 10.9K step-level questions for evaluation, spanning 5 layers of knowledge granularity and 67 hierarchical knowledge concepts. Specifically, we decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric to hierarchically assess inherent issues in LMMs’ reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and provide comprehensive analysis and insight for future development. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. Data and code are available at https://github.com/We-Math/We-Math.

❓ The Questioner

👥 Mega-Team — 20 authors

🧭 Keyword Pioneer — visual mathematical reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

Authors

Runqi Qiao , Qiuna Tan , Guanting Dong , MinhuiWu MinhuiWu , Chong Sun , Xiaoshuai Song , Jiapeng Wang , Zhuoma GongQue , Shanglin Lei , Yifan Zhang , Zhe Wei , Miaoxuan Zhang , Runfeng Qiao , Xiao Zong , Yida Xu , Peiqing Yang , Zhimin Bao , Muxi Diao , Chen Li , Honggang Zhang

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Large Language Models Computer Vision > Core AI > Multimodal Learning Machine Learning > Learning Types > Evaluation Computer Vision > Analysis > Visual Question Answering Deep Learning > Models > Vision-Language Models

Keywords

benchmark evaluation mathematical reasoning multimodal learning visual reasoning large multimodal model knowledge concept knowledge reasoning visual mathematical reasoning knowledge granularity

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Abstract

Authors

Topics

Keywords

Related papers