PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas; Jasper Lee; John Jennings; Jimmy Xin; Michelle Ding; Michael Jennings; Amitayush Thakur; Swarat Chaudhuri

2024 NIPS NeurIPS 2024

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Abstract

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🧭 Keyword Pioneer — formal mathematics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

George Tsoukalas , Jasper Lee , John Jennings , Jimmy Xin , Michelle Ding , Michael Jennings , Amitayush Thakur , Swarat Chaudhuri

Topics

Artificial Intelligence > Core AI > Planning Mathematics & Optimization > Mathematics > Discrete Mathematics

Keywords

automated reasoning formal mathematics neural theorem proving proof verification lean formalization

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024