Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning

Zhengyuan Pan; Yanhao Chen; Zhongquan Jian; Wanru Zhao; Haonan Ma; Meihong Wang; Qingqiang Wu

2026 AAAI AAAI 2026

Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning

Abstract

Abstract Recent research reveals that a minority of high-entropy tokens significantly influence the reasoning quality of large language models (LLMs). Inspired by this, we propose Prototype Entropy Alignment (PEA), a reinforcement learning framework that models effective reasoning not as a single path but as a collection of learnable "entropy signatures." PEA identifies these signatures by clustering expert trajectories' uncertainty patterns into a diverse and continuously updated set of prototypes. The model is then rewarded for aligning its own reasoning process with these evolving targets, creating a self-improvement loop. Instead of replacing traditional outcome-based rewards, PEA provides a complementary, process-oriented signal. Our experiments show that this synergy is crucial: PEA substantially boosts performance on creative and general reasoning tasks and, when combined with outcome rewards, achieves SOTA results on structured tasks such as mathematics. By rewarding alignment with diverse and evolving reasoning structures, PEA offers a robust, verifier-free pathway to enhance reasoning's adaptability.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhengyuan Pan , Yanhao Chen , Zhongquan Jian , Wanru Zhao , Haonan Ma , Meihong Wang , Qingqiang Wu

Topics

Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Core AI > Foundation Models Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning large language model

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026