Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Wenhao Zhan; Baihe Huang; Audrey Huang; Nan Jiang; Jason Lee

2022 COLT COLT 2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Abstract

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability. We also provide alternative analyses based on different assumptions to shed light on the nature of primal-dual algorithms for offline RL.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wenhao Zhan , Baihe Huang , Audrey Huang , Nan Jiang , Jason Lee

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Offline RL

Keywords

offline reinforcement learning sample complexity primal-dual algorithm density ratio

Download PDF

Related papers

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares 2022

Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev 2022

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance 2022

Tight query complexity bounds for learning graph partitions 2022

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States 2022