Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

Chi Jin; Tiancheng Jin; Haipeng Luo; Suvrit Sra; Tiancheng Yu

2020 ICML ICML 2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

Abstract

We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $\mathcal{\tilde{O}}(L|X|\sqrt{|A|T})$ regret with high probability, where $L$ is the horizon, $|X|$ the number of states, $|A|$ the number of actions, and T the number of episodes. To our knowledge, our algorithm is the first to ensure $\mathcal{\tilde{O}}(\sqrt{T})$ regret in this challenging setting; in fact, it achieves the same regret as (Rosenberg & Mansour, 2019a) who consider the easier setting with full-information. Our key contributions are two-fold: a tighter confidence set for the transition function; and an optimistic loss estimator that is inversely weighted by an "upper occupancy bound".

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — unknown transition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — bandit feedback

Authors

Chi Jin , Tiancheng Jin , Haipeng Luo , Suvrit Sra , Tiancheng Yu

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Mathematics & Optimization > Optimization > Online Algorithms

Keywords

adversarial learning markov decision process bandit feedback regret bound adversarial loss unknown transition

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020