Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Davide Maran; Alberto Maria Metelli; Matteo Papini; Marcello Restelli

2024 COLT COLT 2024

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Abstract

We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our $\widetilde{O}(\epsilon^{-2-d/(\nu+1)})$ sample complexity, where $d$ is the dimension of the state-action space and $\nu$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs $(\nu=0)$. At the same time, for $\nu\to\infty$, it recovers and greatly generalizes the $O(\epsilon^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — continuous-space mdp

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Davide Maran , Alberto Maria Metelli , Matteo Papini , Marcello Restelli

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL

Keywords

reinforcement learning function approximation sample complexity markov decision process value iteration continuous-space mdp

Download PDF

Related papers

Exact Mean Square Linear Stability Analysis for SGD 2024

Optimistic Information Directed Sampling 2024

Robust Distribution Learning with Local and Global Adversarial Corruptions (extended abstract) 2024

Depth Separation in Norm-Bounded Infinite-Width Neural Networks 2024

The Sample Complexity of Simple Binary Hypothesis Testing 2024