PLA: Prompt Learning Attack against Text-to-Image Generative Models

Xinqi Lyu; Yihao Liu; Yanjie Li; Bin Xiao

2025 ICCV ICCV 2025

PLA: Prompt Learning Attack against Text-to-Image Generative Models

Abstract

Text-to-Image (T2I) models have gained widespread adoption across various applications. Despite the success, the potential misuse of T2I models poses significant risks of generating Not-Safe-For-Work (NSFW) content. To investigate the vulnerability of T2I models, this paper delves into adversarial attacks to bypass the safety mechanisms under black-box settings. Most previous methods rely on word substitution to search adversarial prompts. Due to limited search space, this leads to suboptimal performance compared to gradient-based training. However, black-box settings present unique challenges to training gradient-driven attack methods, since there is no access to the internal architecture and parameters of T2I models. To facilitate the learning of adversarial prompts in black-box settings, we propose a novel prompt learning attack framework (PLA), where insightful gradient-based training tailored to black-box T2I models is designed by utilizing multimodal similarities. Experiments show that our new method can effectively attack the safety mechanisms of black-box T2I models including prompt filters and post-hoc safety checkers with a high success rate compared to state-of-the-art methods. Warning: This paper may contain offensive model-generated content.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — prompt learning attack

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinqi Lyu , Yihao Liu , Yanjie Li , Bin Xiao

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Learning Types > Adversarial Learning Computer Vision > Generation > Image Generation Artificial Intelligence > Core AI > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning

Keywords

adversarial learning prompt learning adversarial attack black-box attack text-to-image model safety mechanism prompt learning attack

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025