2024 UAI UAI 2024

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes