Minimax-Bayes Reinforcement Learning

Thomas Kleine Buening; Christos Dimitrakakis; Hannes Eriksson; Divya Grover; Emilio Jorge

2023 AISTATS AISTATS 2023

Minimax-Bayes Reinforcement Learning

Abstract

While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Thomas Kleine Buening , Christos Dimitrakakis , Hannes Eriksson , Divya Grover , Emilio Jorge

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Bayesian & Probabilistic > Bayesian Inference Artificial Intelligence > Core AI > Game Theory

Keywords

reinforcement learning sequential decision making minimax optimization bayesian decision theory prior distribution robust policy

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023