2025 ICML ICML 2025

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data