Online Sparse Reinforcement Learning

Botao Hao; Tor Lattimore; Csaba Szepesvári; Mengdi Wang

2021 AISTATS AISTATS 2021

Online Sparse Reinforcement Learning

Abstract

We investigate the hardness of online reinforcement learning in sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable, even if there exists a policy that collects well-conditioned data. Second, we show that if the learner has oracle access to a policy that collects well-conditioned data, then a variant of Lasso fitted Q-iteration enjoys a regret of $O(N^{2/3})$ where $N$ is the number of episodes.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — lasso fitted q-iteration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Botao Hao , Tor Lattimore , Csaba Szepesvári , Mengdi Wang

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning

Keywords

online learning regret bound linear mdp linear markov decision process sparse reinforcement learning lasso fitted q-iteration

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021