Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

Kaiwen Wang; Rahul Kidambi; Ryan Sullivan; Alekh Agarwal; Christoph Dann; Andrea Michi; Marco Gelmi; Yunxuan Li; Raghav Gupta; Kumar Avinava Dubey; Alexandre Rame; Johan Ferret; Geoffrey Cideron; Le Hou; Hongkun Yu; Amr Ahmed; Aranyak Mehta; Leonard Hussenot; Olivier Bachem; Edouard Leurent

2024 EMNLP EMNLP 2024

Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

Abstract

AbstractReward-based finetuning is crucial for aligning language policies with intended behaviors (*e.g.*, creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at *inference time*. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective

👥 Mega-Team — 20 authors

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-objective finetuning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Topics

Machine Learning > Optimization & Theory > Optimization Natural Language Processing > Resources & Methods > Large Language Models

Keywords

parameter-efficient finetuning inference time multi-task training multi-objective finetuning reward-based finetuning steerable language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024