KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

Xiao Yu; Qingyang Wu; Kun Qian; Zhou Yu

2023 EMNLP EMNLP 2023

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

Abstract

AbstractIn task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL often needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing and Reinforcement Learning

🧭 Keyword Pioneer — next-word sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiao Yu , Qingyang Wu , Kun Qian , Zhou Yu

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Machine Learning > Optimization & Theory > Loss Functions Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Natural Language Processing > Applications > Dialogue Systems Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning response generation reward function task-oriented dialog next-word sampling multiwoz dataset keyword learning

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023