2025 ICML ICML 2025

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization