Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

Zhengxu Hou; Bang Liu; Ruihui Zhao; Zijing Ou; Yafei Liu; Xi Chen; Yefeng Zheng

2021 NAACL NAACL 2021

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

Abstract

AbstractFor task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL. To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action pairs in real dialogs. In this paper, we propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot. Based on inverse adversarial reinforcement learning, our designed reward model can provide more accurate and explainable reward signals for state-action pairs. Extensive evaluations show that our approach can be applied to a wide range of reinforcement learning-based dialog systems and significantly improves both the performance and the speed of convergence.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — inverse adversarial learning

🐣 Hot Topic Early Bird — reward modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhengxu Hou , Bang Liu , Ruihui Zhao , Zijing Ou , Yafei Liu , Xi Chen , Yefeng Zheng

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Reinforcement Learning > Methods > Deep RL

Keywords

reinforcement learning reward modeling sparse reward dialog management dialog system inverse adversarial learning

Download PDF

Related papers

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs 2021

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks 2021

Abstract Meaning Representation Guided Graph Encoding and Decoding for Joint Information Extraction 2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing 2021

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers 2021