Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Nate Rahn; Pierluca D&#x27;Oro; Harley Wiltzer; Pierre-Luc Bacon; Marc Bellemare

2023 NIPS NeurIPS 2023

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Abstract

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — return landscape

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nate Rahn , Pierluca D'Oro , Harley Wiltzer , Pierre-Luc Bacon , Marc Bellemare

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Optimization & Theory > Optimization Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning policy optimization policy gradient continuous control policy robustness return landscape distribution-aware optimization

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023