2023 AISTATS AISTATS 2023

Identification of Blackwell Optimal Policies for Deterministic MDPs

Abstract

This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning
🧭 Keyword Pioneer — blackwell optimality
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics