QUARK: Controllable Text Generation with Reinforced Unlearning

Ximing Lu; Sean Welleck; Jack Hessel; Liwei Jiang; LIANHUI Qin; Peter West; Prithviraj Ammanabrolu; Yejin Choi

2022 NIPS NeurIPS 2022

QUARK: Controllable Text Generation with Reinforced Unlearning

Abstract

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model’s input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO, while relying only on standard language modeling primitives.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — language model alignment

🐣 Hot Topic Early Bird — language model alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ximing Lu , Sean Welleck , Jack Hessel , Liwei Jiang , LIANHUI Qin , Peter West , Prithviraj Ammanabrolu , Yejin Choi

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Responsible AI Natural Language Processing > Generation > Text Generation Reinforcement Learning > Methods > Deep RL Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Natural Language Generation

Keywords

reinforcement learning kl divergence language model alignment language model reward conditioning controllable text generation text unlearning reinforced unlearning kl-divergence penalty

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022