2025 COLT COLT 2025

Optimistic Q-learning for average reward and episodic reinforcement learning extended abstract