Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL

W. Zhang; J. He; D. Zhou; Q. Gu; A. Zhang

2023 UAI UAI 2023

Provably efficient representation selection in Low-rank Markov Decision Processes: from online to offline RL

Abstract

The success of deep reinforcement learning (DRL) lies in its ability to learn a representation that is well-suited for the exploration and exploitation task. To understand how the choice of representation can improve the efficiency of reinforcement learning (RL), we study representation selection for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel can be represented in a bilinear form. We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline RL. Specifically, we show that the online version of ReLEX, called ReLEX-UCB, always performs no worse than the state-of-the-art algorithm without representation selection, and achieves a strictly better constant regret if the representation function class has a "coverage" property over the entire state-action space. For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity. This is the first result with constant sample complexity for representation learning in offline RL.

🧭 Keyword Pioneer — low-rank markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

W. Zhang , J. He , D. Zhou , Q. Gu , A. Zhang

Topics

Reinforcement Learning > Methods > Deep RL

Keywords

offline reinforcement learning sample complexity online reinforcement learning regret bound low-rank markov decision process representation selection

Download PDF

Related papers

Memory Mechanism for Unsupervised Anomaly Detection 2023

Semi-supervised learning of partial differential operators and dynamical flows 2023

Composing Efficient, Robust Tests for Policy Selection 2023

Inference for mark-censored temporal point processes 2023

Increasing effect sizes of pairwise conditional independence tests between random vectors 2023