Projected Natural Actor-Critic

Philip S. Thomas; William C Dabney; Stephen Giguere; Sridhar Mahadevan

2013 NIPS NeurIPS 2013

Projected Natural Actor-Critic

Abstract

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability - their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Agent Systems

🧭 Keyword Pioneer — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Philip S. Thomas , William C Dabney , Stephen Giguere , Sridhar Mahadevan

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning constrained optimization markov decision process natural gradient natural actor-critic policy search natural gradient descent mirror descent safe reinforcement learning

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013