2024 ICML ICML 2024

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity