Constrained Cross-Entropy Method for Safe Reinforcement Learning

Min Wen; Ufuk Topcu

2018 NIPS NeurIPS 2018

Constrained Cross-Entropy Method for Safe Reinforcement Learning

Abstract

We study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solve this problem. The method explicitly tracks its performance with respect to constraint satisfaction and thus is well-suited for safety-critical applications. We show that the asymptotic behavior of the proposed algorithm can be almost-surely described by that of an ordinary differential equation. Then we give sufficient conditions on the properties of this differential equation to guarantee the convergence of the proposed algorithm. At last, we show with simulation experiments that the proposed algorithm can effectively learn feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

📈 Trend Setter — Safety

🧭 Keyword Pioneer — differential equation analysis

🐣 Hot Topic Early Bird — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Min Wen , Ufuk Topcu

Topics

Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Methods > Policy Learning Artificial Intelligence > Core AI > Safety

Keywords

policy optimization constraint satisfaction safe reinforcement learning differential equation cross-entropy method differential equation analysis

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018