Distributionally Robust Markov Decision Processes

Huan Xu; Shie Mannor

2010 NIPS NeurIPS 2010

Distributionally Robust Markov Decision Processes

Abstract

We consider Markov decision processes where the values of the parameters are uncertain. This uncertainty is described by a sequence of nested sets (that is, each set contains the previous one), each of which corresponds to a probabilistic guarantee for a different confidence level so that a set of admissible probability distributions of the unknown parameters is specified. This formulation models the case where the decision maker is aware of and wants to exploit some (yet imprecise) a-priori information of the distribution of parameters, and arises naturally in practice where methods to estimate the confidence region of parameters abound. We propose a decision criterion based on distributional robustness: the optimal policy maximizes the expected total reward under the most adversarial probability distribution over realizations of the uncertain parameters that is admissible (i.e., it agrees with the a-priori information). We show that finding the optimal distributionally robust policy can be reduced to a standard robust MDP where the parameters belong to a single uncertainty set, hence it can be computed in polynomial time under mild technical conditions.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — distributionally robust

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌱 Topic Pioneer — Robust Optimization

📈 Trend Setter — Robust Optimization

Authors

Huan Xu , Shie Mannor

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Robust Optimization

Keywords

robust optimization markov decision processes markov decision process sequential decision optimal policy distributionally robust distributional robustness uncertainty set confidence level

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010