Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues; Pierre Menard; Matteo Pirotta; Emilie Kaufmann; Michal Valko

2021 ICML ICML 2021

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Abstract

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and applies to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — continuous mdp

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

Authors

Omar Darwiche Domingues , Pierre Menard , Matteo Pirotta , Emilie Kaufmann , Michal Valko

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Core Methods > Kernel Methods

Keywords

exploration-exploitation tradeoff kernel-based reinforcement learning finite-time analysis regret bound continuous mdp non-parametric estimation kernel methods kernel ucbvi

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021