Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Yi Tian; Jian Qian; Suvrit Sra

2020 NIPS NeurIPS 2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Abstract

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — model-based algorithm

🐣 Hot Topic Early Bird — minimax optimal

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yi Tian , Jian Qian , Suvrit Sra

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Policy Learning

Keywords

minimax optimal regret bound exploration bonus model-based algorithm factored mdp

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020