Geometric-Averaged Preference Optimization for Soft Preference Labels

Hiroki Furuta; Kuang-Huei Lee; Shixiang Shane Gu; Yutaka Matsuo; Aleksandra Faust; Heiga Zen; Izzeddin Gur

2024 NIPS NeurIPS 2024

Geometric-Averaged Preference Optimization for Soft Preference Labels

Abstract

Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic.However, human preferences can vary across individuals, and therefore should be represented distributionally.In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred.This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from.Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements where modestly-confident labels are in the majority.

🧭 Keyword Pioneer — geometric average

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

Authors

Hiroki Furuta , Kuang-Huei Lee , Shixiang Shane Gu , Yutaka Matsuo , Aleksandra Faust , Heiga Zen , Izzeddin Gur

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Loss Functions Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Representation Learning Machine Learning > Learning Types > Preference Learning Deep Learning > Learning Types > Reinforcement Learning from Human Feedback Deep Learning > Learning Types > Optimization

Keywords

preference learning direct preference optimization language model alignment reinforcement learning from human feedback model alignment loss function reward model soft label geometric average preference distribution large language model alignment large language model geometric averaging

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024