2025 AISTATS AISTATS 2025

Achieving $\widetilde\mathcalO(\sqrtT)$ Regret in Average-Reward POMDPs with Known Observation Models