FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang; Haihong E; Jiacheng Liu; Zhongjun Yang; Rongjin Li; Zihua Rong; Haoyang He; Zhuodi Hao; Xinyang Hu; Kun Ji; Ziyan Ma; Mengyuan Ji; Jun Zhang; Chenghao Ma; Qianhe Zheng; Yang Liu; Yiling Huang; Xinyi Hu; Qing Huang; Zijian Xie; Shiyao Peng

2025 ICCV ICCV 2025

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Abstract

We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements. (1) Multimodality: We meticulously transform existing financial reasoning benchmarks, and construct novel questions from the latest Chinese financial research reports. FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts. (2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth. (3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial knowledge with the understanding of complex financial images and text. The best-performing MLLM achieves only 51.4% accuracy on Hard problems. We believe that FinMMR will drive advancements in enhancing the reasoning capabilities of MLLMs in real-world scenarios.

👥 Mega-Team — 21 authors

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zichen Tang , Haihong E , Jiacheng Liu , Zhongjun Yang , Rongjin Li , Zihua Rong , Haoyang He , Zhuodi Hao , Xinyang Hu , Kun Ji , Ziyan Ma , Mengyuan Ji , Jun Zhang , Chenghao Ma , Qianhe Zheng , Yang Liu , Yiling Huang , Xinyi Hu , Qing Huang , Zijian Xie , Shiyao Peng

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models

Keywords

benchmark evaluation visual reasoning financial analysis vision-language model multimodal large language model financial benchmark numerical reasoning bilingual evaluation

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025