2025
ICML
ICML 2025
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Authors
Junkang Wu
,
xue wang
,
Zhengyi Yang
,
Jiancan Wu
,
Jinyang Gao
,
Bolin Ding
,
Xiang Wang
,
Xiangnan He