2024 ICML ICML 2024

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences