JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models

Xiaolong Jin; Zixuan Weng; Hanxi Guo; Chenlong Yin; Siyuan Cheng; Guangyu Shen; Xiangyu Zhang

2025 ICCV ICCV 2025

JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models

Abstract

Diffusion models are widely used in real-world applications, but ensuring their safety remains a major challenge. Despite many efforts to enhance the security of diffusion models, jailbreak and adversarial attacks can still bypass these defenses, generating harmful content. However, the lack of standardized evaluation makes it difficult to assess the robustness of diffusion model system. To address this, we introduce JailbreakDiffBench, a comprehensive benchmark for systematically evaluating the safety of diffusion models against various attacks and under different defenses. Our benchmark includes a high-quality, human-annotated prompt and image dataset covering diverse attack scenarios. It consists of two key components: (1) an evaluation protocol to measure the effectiveness of moderation mechanisms and (2) an attack assessment module to benchmark adversarial jailbreak strategies. Through extensive experiments, we analyze existing filters and reveal critical weaknesses in current safety measures. JailbreakDiffBench is designed to support both text-to-image and text-to-video models, ensuring extensibility and reproducibility.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaolong Jin , Zixuan Weng , Hanxi Guo , Chenlong Yin , Siyuan Cheng , Guangyu Shen , Xiangyu Zhang

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Application Areas > Privacy Deep Learning > Models > Diffusion Models

Keywords

adversarial attack diffusion model safety evaluation jailbreak attack safety benchmark

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025