2025
ICML
ICML 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
Authors
Yichi Zhang
,
Siyuan Zhang
,
Yao Huang
,
Zeyu Xia
,
Zhengwei Fang
,
Xiao Yang
,
Ranjie Duan
,
Dong Yan
,
Yinpeng Dong
,
Jun Zhu