Hierarchical Reinforcement Learning for Open-Domain Dialog

Abdelrhman Saleh; Natasha Jaques; Asma Ghandeharioun; Judy Shen; Rosalind Picard

2020 AAAI AAAI 2020

Hierarchical Reinforcement Learning for Open-Domain Dialog

Abstract

Abstract Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — variational sequence model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Abdelrhman Saleh , Natasha Jaques , Asma Ghandeharioun , Judy Shen , Rosalind Picard

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Dialogue Systems Reinforcement Learning > Methods > Deep RL Natural Language Processing > Applications > Dialogue Systems Deep Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Sequence Modeling

Keywords

policy gradient hierarchical reinforcement learning credit assignment dialog generation utterance-level embedding open-domain dialog variational sequence model conversational goal conversational reward

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020