Style Aligned Image Generation via Shared Attention

Amir Hertz; Andrey Voynov; Shlomi Fruchter; Daniel Cohen-or

2024 CVPR CVPR 2024

Style Aligned Image Generation via Shared Attention

Abstract

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields generating visually compelling outputs from textual prompts. However controlling these models to ensure consistent style remains challenging with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper we introduce StyleAligned a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity underscoring its efficacy in achieving consistent style across various inputs.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Amir Hertz , Andrey Voynov , Shlomi Fruchter , Daniel Cohen-or

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Adversarial Learning Deep Learning > Techniques > Attention

Keywords

image generation style transfer attention mechanism diffusion model text-to-image model style consistency attention sharing style alignment

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024