Decoding-Time Language Model Alignment with Multiple Objectives

Ruizhe Shi; Yifang Chen; Yushi Hu; Alisa Liu; Hannaneh Hajishirzi; Noah A. Smith; Simon S Du; Simon S. Du

2024 NIPS NeurIPS 2024

Decoding-Time Language Model Alignment with Multiple Objectives

Abstract

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding~(MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weighting over different objectives.We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy.Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.Empirical results demonstrate the effectiveness of the algorithm. For example, compared to a parameter-merging baseline, MOD achieves 12.8\% overall reward improvement when equally optimizing towards $3$ objectives. Moreover, we experiment with MOD on combining three fully-finetuned LMs of different model sizes, each aimed at different objectives such as safety, coding, and general user preference. Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models. Our best combination reduces toxicity on Toxigen to nearly 0\% and achieves 7.9--33.3\% improvement across three other metrics ($\textit{i.e.}$, Codex@1, GSM-COT, BBH-COT).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — multi-objective decoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ruizhe Shi , Yifang Chen , Yushi Hu , Alisa Liu , Hannaneh Hajishirzi , Noah A. Smith , Simon S Du , Simon S. Du

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Multi-Objective Optimization Deep Learning > Learning Types > Reinforcement Learning from Human Feedback

Keywords

preference learning model merging language model alignment reward function multi-objective optimization decoding algorithm multi-objective decoding

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024