trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback

Alexander Havrilla; Maksym Zhuravinskyi; Duy Phung; Aman Tiwari; Jonathan Tow; Stella Biderman; Quentin Anthony; Louis Castricato

2023 EMNLP EMNLP 2023

trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback

Abstract

AbstractReinforcement learning from human feedback (RLHF) utilizes human feedback to better align large language models with human preferences via online optimization against a learned reward model. Current RLHF paradigms rely on Proximal Policy Optimization (PPO), which quickly becomes a challenge to implement and scale up to large architectures. To address this difficulty we present the AutoRLHF library as a feature complete open-source framework for RLHF fine-tuning of models up to and exceeding 70 billion parameters. To do so we implement support for multiple types of distributed training including distributed data parallel, model sharded, as well as tensor, sequential, and pipeline parallelism. Additionally, we implement compute and memory saving features, giving AutoRLHF the flexibility to support users with a wide range of compute resources. This includes offline RL methods like Implicit Language Q Learning (ILQL) as a compute efficient alternative to PPO. We find offline fine-tuning offers competitive performance relative to online algorithms while being easier to implement, train, and scale. To evaluate our framework we train RLHF models on two separate well-known tasks using publicly available human preference data. Models trained with AutoRLHF achieve preference win-rates over baselines at rates comparable to the original works.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — implicit language q learning

🐣 Hot Topic Early Bird — proximal policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexander Havrilla , Maksym Zhuravinskyi , Duy Phung , Aman Tiwari , Jonathan Tow , Stella Biderman , Quentin Anthony , Louis Castricato

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Resources & Methods > Large Language Models Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Reinforcement Learning

Keywords

offline reinforcement learning reinforcement learning from human feedback distributed training proximal policy optimization large language model implicit language q learning

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023