Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem

Yubo Wang; Ping Nie; Kai Zou; Lijun Wu; Wenhu Chen

2025 EMNLP EMNLP 2025

Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem

Abstract

AbstractCritique Fine-Tuning (CFT) has recently emerged as a promising paradigm for unlocking the reasoning capabilities of large language models (LLMs). In this work, we introduce one-shot CFT, a highly compute-efficient approach that leverages critique data generated from a single math problem. Remarkably, this method yields significant gains in reasoning accuracy, surpassing one-shot RLVR (Reinforcement Learning with Verifiable Reward) while requiring 15 to 20 times less compute. Given one math problem, we first prompt a set of diverse small models to produce candidate solutions, then use frontier models such as GPT-4.1 to generate high-quality critiques of these responses. We fine-tune Qwen and Llama family models ranging from 1.5B to 14B parameters with CFT. With just 5 GPU hours, our models achieve up to a 16 percent absolute improvement in average accuracy across six mathematical reasoning benchmarks (for example, Qwen2.5-Math-7B improves from 26 percent to 42 percent). Furthermore, ablation studies reveal the robustness of one-shot CFT across different prompt problems. Our findings suggest an extremely compute-efficient approach to unleash the reasoning potential of LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — verifiable reward

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yubo Wang , Ping Nie , Kai Zou , Lijun Wu , Wenhu Chen

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Learning Types > Self-Supervised Learning

Keywords

reinforcement learning mathematical reasoning large language model verifiable reward critique fine-tuning

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025