2026 AAAI AAAI 2026

CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis

Abstract

Abstract We present CHIMERA, a novel framework for generating realistic, generalizable, and prompt-driven industrial anomalies from natural language instructions. Our method addresses two key challenges in text-guided anomaly synthesis: (1) the scarcity of scalable, high-quality paired anomaly data and (2) the difficulty of efficiently adapting large diffusion models to domain-specific tasks without overfitting. To tackle these challenges, we first introduce a Vision-Language Model (VLM)-guided data curation pipeline that automatically generates semantically rich and spatially grounded captions from normal images, enabling effective dataset augmentation without manual annotations. Building upon this, we propose a parameter-efficient fine-tuning strategy that adapts a pre-trained Diffusion Transformer (Stable Diffusion 3) using lightweight LoRA adapters. By aligning structured prompts with the model's pre-trained language-vision prior and introducing auxiliary attention-based mask supervision, our method prevents overfitting, enhances spatial consistency, and ensures efficient training even with limited data. Extensive experiments show that CHIMERA is the first unified framework to achieve controllable, scalable, and generalizable industrial anomaly generation by integrating VLM-guided data curation with efficient diffusion-based training, significantly improving anomaly detection in low-data and unseen scenarios.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer — industrial defect
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio