OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework

Jian Hu; Xibin Wu; Wei Shen; Jason Klein Liu; Weixun Wang; Songlin Jiang; Haoran Wang; Hao Chen; Bin Chen; Wenkai Fang; Xianyu; Yu Cao; Haotian Xu; Yiming Liu

2025 EMNLP EMNLP 2025

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework

Abstract

AbstractLarge Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22× to 1.68× across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — ray distributed

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jian Hu , Xibin Wu , Wei Shen , Jason Klein Liu , Weixun Wang , Songlin Jiang , Haoran Wang , Hao Chen , Bin Chen , Wenkai Fang , Xianyu , Yu Cao , Haotian Xu , Yiming Liu

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Deep Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Optimization & Theory > Efficient Computing Machine Learning > Learning Types > Reinforcement Learning from Human Feedback

Keywords

reinforcement learning language model alignment reinforcement learning from human feedback parameter efficient model alignment distributed training human feedback model fine-tuning large language model reinforcement learning with verifiable reward ray distributed

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025