Scale-free Adversarial Reinforcement Learning

Mingyu Chen; Xuezhou Zhang

2024 COLT COLT 2024

Scale-free Adversarial Reinforcement Learning

Abstract

This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs), where the scale of rewards/losses is unknown to the learner. We design a generic algorithmic framework, \underline{S}cale \underline{C}lipping \underline{B}ound (\texttt{SCB}), and instantiate this framework in both the adversarial Multi-armed Bandit (MAB) setting and the adversarial MDP setting. Through this framework, we achieve the first minimax optimal expected regret bound and the first high-probability regret bound in scale-free adversarial MABs, resolving an open problem raised in \cite{hadiji2020adaptation}. On adversarial MDPs, our framework also give birth to the first scale-free RL algorithm with a $\tilde{\mathcal{O}}(\sqrt{T})$ high-probability regret guarantee.

🧭 Keyword Pioneer — scale-free learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Mingyu Chen , Xuezhou Zhang

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL

Keywords

multi-armed bandit minimax optimal regret adversarial mdp scale-free learning high-probability regret bound

Download PDF

Related papers

Exact Mean Square Linear Stability Analysis for SGD 2024

Optimistic Information Directed Sampling 2024

Robust Distribution Learning with Local and Global Adversarial Corruptions (extended abstract) 2024

Depth Separation in Norm-Bounded Infinite-Width Neural Networks 2024

The Sample Complexity of Simple Binary Hypothesis Testing 2024