Convergent Policy Optimization for Safe Reinforcement Learning

Ming Yu; Zhuoran Yang; Mladen Kolar; Zhaoran Wang

2019 NIPS NeurIPS 2019

Convergent Policy Optimization for Safe Reinforcement Learning

Abstract

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions. For such a problem, we construct a sequence of surrogate convex constrained optimization problems by replacing the nonconvex functions locally with convex quadratic functions obtained from policy gradient estimators. We prove that the solutions to these surrogate problems converge to a stationary point of the original nonconvex problem. Furthermore, to extend our theoretical results, we apply our algorithm to examples of optimal control and multi-agent reinforcement learning with safety constraints.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Safety

🐣 Hot Topic Early Bird — optimal control

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ming Yu , Zhuoran Yang , Mladen Kolar , Zhaoran Wang

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Artificial Intelligence > Core AI > Safety

Keywords

nonconvex optimization policy optimization constrained optimization optimal control safe reinforcement learning safety constraint nonlinear function approximation multi-agent system

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019