Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

Zhengqing Zhou; Zhengyuan Zhou; Qinxun Bai; Linhai Qiu; Jose Blanchet; Peter Glynn

2021 AISTATS AISTATS 2021

Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

Abstract

While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness–or the lack thereof–remains an important issue that remains inadequately addressed. In this paper, we provide a distributionally robust formulation of offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment arising as a perturbation of the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves $O_P\left(1/\sqrt{n}\right)$ regret, meaning that with high probability, the policy learned from using $n$ training data points will be $O\left(1/\sqrt{n}\right)$ close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — robust value estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Zhengqing Zhou , Zhengyuan Zhou , Qinxun Bai , Linhai Qiu , Jose Blanchet , Peter Glynn

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning

Keywords

offline reinforcement learning distributionally robust optimization regret bound tabular reinforcement learning robust policy robust value estimation

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021