Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Marianna Ohanyan; Hayk Manukyan; Zhangyang Wang; Shant Navasardyan; Humphrey Shi

2024 CVPR CVPR 2024

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Abstract

We present Zero-Painter a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions coupled with a global text prompt to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes. We will make the codes and the models publicly available.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marianna Ohanyan , Hayk Manukyan , Zhangyang Wang , Shant Navasardyan , Humphrey Shi

Topics

Deep Learning > Models > Diffusion Models Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Deep Learning > Techniques > Transfer Learning

Keywords

image generation text-to-image synthesis object segmentation diffusion model cross-attention mechanism layout control

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024