2025 ICML ICML 2025

ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization