2025 ICCV ICCV 2025

Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts

Abstract

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate promising results in controlled settings, their robustness in real-world applications and suitability for deployment remain uncertain. In this work, we (1) identify a critical gap in evaluating sanitized models, particularly in assessing their performance across diverse concept dimensions, and (2) systematically analyze the failure modes of text-to-image models post-erasure. We focus on the unintended consequences of concept removal on non-target concepts across different levels of interconnected relationships including visually similar, binomial, and semantically related concepts. To address this, we introduce EraseBench, a comprehensive benchmark for evaluating post-erasure performance. EraseBench includes over 100 curated concepts, targeted evaluation prompts, and a robust set of metrics to assess both effectiveness and side effects of erasure. Our findings reveal a phenomenon of concept entanglement, where erasure leads to unintended suppression of non-target concepts, causing spillover degradation that manifests as distortions and a decline in generation quality.

The Questioner
🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning
🧭 Keyword Pioneer — concept entanglement
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio