Distributed Offline Policy Optimization Over Batch Data

Han Shen; Songtao Lu; Xiaodong Cui; Tianyi Chen

2023 AISTATS AISTATS 2023

Distributed Offline Policy Optimization Over Batch Data

Abstract

Federated learning (FL) has received increasing interests during the past years, However, most of the existing works focus on supervised learning, and federated learning for sequential decision making has not been fully explored. Part of the reason is that learning a policy for sequential decision making typically requires repeated interaction with the environments, which is costly in many FL applications.To overcome this issue, this work proposes a federated offline policy optimization method abbreviated as FedOPO that allows clients to jointly learn the optimal policy without interacting with environments during training. Albeit the nonconcave-convex-strongly concave nature of the resultant max-min-max problem, we establish both the local and global convergence of our FedOPO algorithm. Experiments on the OpenAI gym demonstrate that our algorithm is able to find a near-optimal policy while enjoying various merits brought by FL, including training speedup and improved asymptotic performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — offline policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Han Shen , Songtao Lu , Xiaodong Cui , Tianyi Chen

Topics

Artificial Intelligence > Learning Paradigms > Federated Learning Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Federated Learning

Keywords

federated learning offline reinforcement learning policy optimization sequential decision making convergence guarantee offline policy optimization max-min optimization

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023