2013
ICML
ICML 2013
Better Rates for Any Adversarial Deterministic MDP
Abstract
We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.
🚀
Conference Pioneer
— ICML 2013
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization and Reinforcement Learning
🧭
Keyword Pioneer
— deterministic mdp
🐣
Hot Topic Early Bird
— regret minimization
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio