2024 ICML ICML 2024

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration