Constrained Reinforcement Learning via Policy Splitting

Haoxian Chen; Henry Lam; Fengpei Li; Amirhossein Meisami

2020 ACML ACML 2020

Constrained Reinforcement Learning via Policy Splitting

Abstract

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Haoxian Chen , Henry Lam , Fengpei Li , Amirhossein Meisami

Topics

Reinforcement Learning > Methods > Policy Learning

Keywords

policy optimization constrained markov decision process model-free reinforcement learning feasibility constraint

Download PDF

Related papers

CCA-Flow: Deep Multi-view Subspace Learning with Inverse Autoregressive Flow 2020

Convergence Rates of a Momentum Algorithm with Bounded Adaptive Step Size for Nonconvex Optimization 2020

Dual Learning: Theoretical Study and an Algorithmic Extension 2020

Randomness Efficient Feature Hashing for Sparse Binary Data 2020

Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks 2020