Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs

Jakub Podolak; Rajeev Verma

2025 EMNLP EMNLP 2025

Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs

Abstract

AbstractWe study the source of uncertainty in DeepSeek R1-32B by analyzing its self-reported verbal confidence on question answering (QA) tasks. In the default answer-then-confidence setting, the model is regularly over-confident, whereas semantic entropy - obtained by sampling many responses - remains reliable. We hypothesize that this is because of semantic entropy’s larger test-time compute, which lets us explore the model’s predictive distribution. We show that granting DeepSeek the budget to explore its distribution by forcing a long chain-of-thought before the final answer greatly improves its verbal score effectiveness, even on simple fact-retrieval questions that normally require no reasoning. Our analysis concludes that reliable uncertainty estimation requires explicit exploration of the generative space, and self-reported confidence is trustworthy only after such exploration.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — verbal confidence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jakub Podolak , Rajeev Verma

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Stochastic Processes Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Uncertainty Quantification

Keywords

question answering chain-of-thought reasoning confidence calibration uncertainty estimation semantic entropy verbal confidence reasoning budget

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025