Reinforcement Learning in Finite MDPs: PAC Analysis

Alexander L. Strehl; Lihong Li; Michael L. Littman

2009 JMLR JMLR 2009

Reinforcement Learning in Finite MDPs: PAC Analysis

Abstract

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These "PAC-MDP" algorithms include the well-known E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined analysis for upper and lower bounds is presented to yield insight into the differences between the model-free Delayed Q-learning and the model-based R-MAX. [abs] [ pdf ][ bib ] © JMLR 2009. (edit, beta)

🧭 Keyword Pioneer — model-free learning

🐣 Hot Topic Early Bird — sample complexity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexander L. Strehl , Lihong Li , Michael L. Littman

Topics

Machine Learning > Optimization & Theory > Learning Theory

Keywords

model-based learning sample complexity markov decision process model-free learning

Download PDF

Related papers

Subgroup Analysis via Recursive Partitioning 2009

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization 2009

An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs 2009

Nonextensive Information Theoretic Kernels on Measures 2009

The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models 2009