GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

Oh Joon Kwon; Daiki E. Matsunaga; Kee-eung Kim

2024 EMNLP EMNLP 2024

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

Abstract

AbstractA critical component of the current generation of language models is preference alignment, which aims to precisely control the model’s behavior to meet human needs and values. The most notable among such methods is Reinforcement Learning with Human Feedback (RLHF) and its offline variant Direct Preference Optimization (DPO), both of which seek to maximize a reward model based on human preferences. In particular, DPO derives reward signals directly from the offline preference data, but in doing so overfits the reward signals and generates suboptimal responses that may contain human biases in the dataset. In this work, we propose a practical application of a diversity-seeking RL algorithm called GFlowNet-DPO (GDPO) in an offline preference alignment setting to curtail such challenges. Empirical results show GDPO can generate far more diverse responses than the baseline methods that are still relatively aligned with human values in dialog generation and summarization tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — diversity seeking

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Oh Joon Kwon , Daiki E. Matsunaga , Kee-eung Kim

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Responsible AI Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Objective Optimization Deep Learning > Learning Types > Generative Models Deep Learning > Learning Types > Reinforcement Learning from Human Feedback

Keywords

reinforcement learning offline reinforcement learning direct preference optimization preference alignment language model reward model generative flow network large language model diversity seeking

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024