On Proximal Policy Optimization’s Heavy-tailed Gradients

Saurabh Garg; Joshua Zhanson; Emilio Parisotto; Adarsh Prasad; Zico Kolter; Zachary Lipton; Sivaraman Balakrishnan; Ruslan Salakhutdinov; Pradeep Ravikumar

2021 ICML ICML 2021

On Proximal Policy Optimization’s Heavy-tailed Gradients

Abstract

Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich ("heavy-tailed") regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent’s policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — heavy-tailed gradient

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — robust statistics

Authors

Saurabh Garg , Joshua Zhanson , Emilio Parisotto , Adarsh Prasad , Zico Kolter , Zachary Lipton , Sivaraman Balakrishnan , Ruslan Salakhutdinov , Pradeep Ravikumar

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

reinforcement learning policy gradient robust statistics robust estimation heavy-tailed distribution gradient clipping heavy-tailed gradient proximal policy optimization surrogate reward function

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021