2025
ICML
ICML 2025
On the Robustness of Reward Models for Language Model Alignment
Authors
Jiwoo Hong
,
Noah Lee
,
Eunki Kim
,
Guijin Son
,
Woojin Chung
,
Aman Gupta
,
Shao Tang
,
James Thorne