2025 COLING COLING 2025

An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning

Abstract

AbstractDialogue policy trains an agent to select dialogue actions frequently implemented via deep reinforcement learning (DRL). The model-based reinforcement methods built a world model to generate simulated data to alleviate the sample inefficiency. However, traditional world model methods merely consider one-step dialogues, leading to an inaccurate environmental simulation. Furthermore, different users may have different intention preferences, while most existing studies lack consideration of the intention-preferences causal relationship. This paper proposes a novel framework for dialogue policy learning named MCA, implemented through model-based reinforcement learning with automatically constructed causal chains. The MCA model utilizes an autoregressive Transformer to model dialogue trajectories, enabling a more accurate simulation of the environment. Additionally, it constructs a causal chains module that outputs latent preference distributions for intention-action pairs, thereby elucidating the relationship between user intentions and agent actions. The experimental results show that MCA can achieve state-of-the-art performances on three dialogue datasets over the compared dialogue agents, highlighting its effectiveness and robustness.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio