Policy Optimization as Wasserstein Gradient Flows

Ruiyi Zhang; Changyou Chen; Chunyuan Li; Lawrence Carin

2018 ICML ICML 2018

Policy Optimization as Wasserstein Gradient Flows

Abstract

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its correspondence to policy-distribution optimization has been unclear mathematically. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes convex in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Specifically, we define gradient flows on both the parameter-distribution space and policy-distribution space, leading to what we term indirect-policy and direct-policy learning frameworks, respectively. Extensive experiments verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — distribution optimization

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Ruiyi Zhang , Changyou Chen , Chunyuan Li , Lawrence Carin

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Core Methods > Optimization

Keywords

reinforcement learning policy optimization policy gradient probability distribution gradient descent probability measure distribution optimization wasserstein gradient flow

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018