CP-Router: An Uncertainty-Aware Router Between LLM and LRM
Abstract
Abstract Recent advances in large reasoning models (LRMs) have significantly enhanced long-chain reasoning capabilities over standard large language models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To address this, we propose CP-Router, a training-free, model-agnostic routing framework that dynamically selects between an LLM and an LRM, demonstrated with multiple-choice question answering (MCQA) prompts. The routing decision is guided by the prediction uncertainty estimates derived via Conformal Prediction (CP), which provides rigorous coverage guarantees. To improve uncertainty differentiation across inputs, we introduce Full and Binary Entropy (FBE), a novel entropy-based criterion that adaptively selects the appropriate CP threshold. Experiments across MCQA and QA benchmarks—including mathematics, logical reasoning, and Chinese chemistry—demonstrate that CP-Router efficiently reduces token usage while maintaining or even improving accuracy compared to using LRM alone. We further demonstrate the generality and robustness of CP-Router by extending it to diverse model pairings beyond the LLM–LRM setting.