Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts

Ibtihel Amara; Ahmed Imtiaz Humayun; Ivana Kajić; Zarana Parekh; Natalie Harris; Sarah Young; Chirag Nagpal; Najoung Kim; Junfeng He; Cristina Nader Vasconcelos; Deepak Ramachandran; Golnoosh Farnadi; Katherine Heller; Mohammad Havaei; Negar Rostamzadeh

2025 ICCV ICCV 2025

Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts

Abstract

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate promising results in controlled settings, their robustness in real-world applications and suitability for deployment remain uncertain. In this work, we (1) identify a critical gap in evaluating sanitized models, particularly in assessing their performance across diverse concept dimensions, and (2) systematically analyze the failure modes of text-to-image models post-erasure. We focus on the unintended consequences of concept removal on non-target concepts across different levels of interconnected relationships including visually similar, binomial, and semantically related concepts. To address this, we introduce EraseBench, a comprehensive benchmark for evaluating post-erasure performance. EraseBench includes over 100 curated concepts, targeted evaluation prompts, and a robust set of metrics to assess both effectiveness and side effects of erasure. Our findings reveal a phenomenon of concept entanglement, where erasure leads to unintended suppression of non-target concepts, causing spillover degradation that manifests as distortions and a decline in generation quality.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🧭 Keyword Pioneer — concept entanglement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ibtihel Amara , Ahmed Imtiaz Humayun , Ivana Kajić , Zarana Parekh , Natalie Harris , Sarah Young , Chirag Nagpal , Najoung Kim , Junfeng He , Cristina Nader Vasconcelos , Deepak Ramachandran , Golnoosh Farnadi , Katherine Heller , Mohammad Havaei , Negar Rostamzadeh

Topics

Artificial Intelligence > Core AI > Model Compression Deep Learning > Models > Generative Models

Keywords

text-to-image generation model safety generative model concept erasure concept entanglement

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025