ReAlign: Structured Revision for Small Language Model Alignment

Ruijun Chen; Jiajian Guo; Hongzhan Chen; Fanqi Wan; Qifan Wang; Xiaojun Quan

2025 EMNLP EMNLP 2025

ReAlign: Structured Revision for Small Language Model Alignment

Abstract

AbstractAligning small language models with human preferences is challenging, as weak policies struggle to generate informative on-policy samples and suffer from unstable gradients when trained on off-policy signals from stronger models. In this work, we propose ReAlign, a training framework that combines the stability of on-policy learning with the guidance of reviser-assisted supervision. In the ReAlign, we first train a lightweight reviser to improve policy-generated responses using preference-based supervision, conditioned on both the prompt and the initial output. And then, the policy is optimized using a combination of standard on-policy preference pairs and reviser-enhanced pairs constructed as a structured revision task, where the latter provide richer, more learnable feedback. Experimental results on AlpacaEval-2 and Arena-Hard demonstrate that ReAlign significantly boosts alignment performance for weak policies, outperforming strong preference optimization baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — structured revision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ruijun Chen , Jiajian Guo , Hongzhan Chen , Fanqi Wan , Qifan Wang , Xiaojun Quan

Topics

Artificial Intelligence > Core AI > Foundation Models Reinforcement Learning > Methods > Policy Learning

Keywords

reinforcement learning preference optimization policy learning language model alignment structured revision

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025