2024 ICML ICML 2024

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization