2025 ICML ICML 2025

M$^3$HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality