2025
ICML
ICML 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
Authors
Han Zhong
,
Zikang Shan
,
Guhao Feng
,
Wei Xiong
,
Xinle Cheng
,
Li Zhao
,
Di He
,
Jiang Bian
,
Liwei Wang