2024 ICML ICML 2024

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint