Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation

Karin de Langis; Ryan Koo; Dongyeop Kang

2024 EMNLP EMNLP 2024

Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation

Abstract

AbstractTextual style expresses a diverse set of information, including interpersonal dynamics (e.g., formality) and the author’s emotions or attitudes (e.g., disgust). An open question is how language models can be explicitly controlled so that they weave together target styles when generating text: for example, to produce text that is both negative and non-toxic. One approach to such controlled generation is multi-objective reinforcement learning (RL), but how to best combine multiple objectives in a reward function is an open question. In this paper, we investigate various formulations of multi-style reward formulations, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. We find that our proposed dynamic weighting outperforms static weighting approaches with respect style control while maintaining linguistic quality, and we explore its effectiveness in 2- and 3-style control.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — style-controllable generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Karin de Langis , Ryan Koo , Dongyeop Kang

Topics

Machine Learning > Optimization & Theory > Loss Functions Natural Language Processing > Generation > Text Generation Reinforcement Learning > Methods > Policy Learning Deep Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Objective Optimization

Keywords

reinforcement learning text generation reward function multi-objective optimization controllable generation multi-objective reinforcement learning style control controlled text generation style-controllable generation dynamic reward weighting multi-style generation discriminator calibration

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024