2015
ACML
ACML 2015
Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation
Abstract
Policy gradient algorithms are widely used in reinforcement learning problems with continuous action spaces, which update the policy parameters along the steepest direction of the expected return. However, large variance of policy gradient estimation often causes instability of policy update. In this paper, we propose to suppress the variance of gradient estimation by directly employing the variance of policy gradients as a regularizer. Through experiments, we demonstrate that the proposed variance-regularization technique combined with parameter-based exploration and baseline subtraction provides more reliable policy updates than non-regularized counterparts.
🌉
Interdisciplinary Bridge
— Machine Learning and Reinforcement Learning
📈
Trend Setter
— Reinforcement Learning
🧭
Keyword Pioneer
— expected return
🐣
Hot Topic Early Bird
— reinforcement learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics