2024
ICML
ICML 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Authors
Mantas Mazeika
,
Long Phan
,
Xuwang Yin
,
Andy Zou
,
Zifan Wang
,
Norman Mu
,
Elham Sakhaee
,
Nathaniel Li
,
Steven Basart
,
Bo Li
,
David Forsyth
,
Dan Hendrycks