2021
AISTATS
AISTATS 2021
On the Linear Convergence of Policy Gradient Methods for Finite MDPs
Abstract
We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.
🌉
Interdisciplinary Bridge
— Machine Learning and Reinforcement Learning
🐝
Cross-Pollinator
— Artificial Intelligence, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics