Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection

Dantong Wu; Zhiqiang Chen; Tianjiao Du; Peipei Ran; Mengchao Bai; Kai Zhang

2025 AAAI AAAI 2025

Spin: Diffusion-based Semantic Image Painting Through Independent Information Injection

Abstract

Abstract Diffusion models have been utilized as powerful tools for various image editing tasks, including semantic image painting (SIP), which aims to generate content within masked regions conditioned on a reference image or text. SIP, especially those using images as conditions, often suffers from three issues: semantic inconsistency, unnatural transitions, and style inconsistency, which significantly hinder its practical application. To address these challenges, we propose a novel Semantic Image Painting framework with INdependent INformation INjection (Spin). Specifically, we compute a saliency map to segregate the reference image into salient and non-salient components. We then filter out the non-salient information during the semantic embedding extraction phase and precisely inject the semantic embedding into the masked region instead of the whole image during the semantic generation phase. Furthermore, we impose an additional style guidance to promote style consistency between background and foreground. Experimental results demonstrate that Spin achieve superior semantic similarity and image coherence across various styles, including realistic, pencil drawings, cartoon, and oil painting. Additionally, Spin offers diversity and editability, and can be integrated into other models that meet our prerequisites.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — semantic image painting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dantong Wu , Zhiqiang Chen , Tianjiao Du , Peipei Ran , Mengchao Bai , Kai Zhang

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing Computer Vision > Generation > Image Editing

Keywords

image generation image editing saliency map diffusion model style consistency image coherence semantic image painting

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025