Stochastic Variance-Reduced Policy Gradient

Matteo Papini; Damiano Binaghi; Giuseppe Canonaco; Matteo Pirotta; Marcello Restelli

2018 ICML ICML 2018

Stochastic Variance-Reduced Policy Gradient

Abstract

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — gradient estimate

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Policy Learning

Keywords

stochastic optimization reinforcement learning policy gradient markov decision process importance sampling variance reduction gradient estimate

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018