2025 ICML ICML 2025

Minimax Optimal Regret Bound for Reinforcement Learning with Trajectory Feedback