Identification of Blackwell Optimal Policies for Deterministic MDPs

Victor Boone; Bruno Gaujal

2023 AISTATS AISTATS 2023

Identification of Blackwell Optimal Policies for Deterministic MDPs

Abstract

This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — blackwell optimality

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

Authors

Victor Boone , Bruno Gaujal

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning

Keywords

markov decision process optimal policy query complexity confidence bound blackwell optimality blackwell optimal policy deterministic mdp policy identification

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023