2025 ICML ICML 2025

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks