LLMs Know More About Numbers than They Can Say

Fengting Yuchi; Li Du; Jason Eisner

2026 EACL EACL 2026

LLMs Know More About Numbers than They Can Say

Abstract

AbstractAlthough state-of-the-art LLMs can solve math problems, we find that they make errors on numerical comparisons with mixed notation: "Which is larger, 5.7 × 102 or 580?”This raises a fundamental question: Do LLMs even know how big these numbers are?We probe the hidden states of several smaller open-source LLMs. A single linear projection of an appropriate hidden layer encodes the *log-magnitudes* of both kinds of numerals, allowing us to recover the numbers with relative error of about 2.3% (on restricted synthetic text) or 19.06% (on scientific papers).Furthermore, the hidden state after reading a *pair* of numerals encodes their *ranking*, with a linear classifier achieving over 90% accuracy.Yet surprisingly, when explicitly asked to rank the same pairs of numerals, these LLMs achieve only 50-70% accuracy, with worse performance for models whose probes are less effective.Finally, we show that incorporating the classifier probe’s log-loss as an auxiliary objective during finetuning brings an additional 3.22% improvement in verbalized accuracy over base models, demonstrating that improving models’ internal magnitude representations can enhance their numerical reasoning capabilities.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fengting Yuchi , Li Du , Jason Eisner

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization

Keywords

representation learning linear classifier hidden state numerical reasoning

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026