LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Chenjie Cao; Yunuo Cai; Qiaole Dong; Yikai Wang; Yanwei Fu

2024 CVPR CVPR 2024

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Abstract

This paper introduces LeftRefill an innovative approach to efficiently harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis. As the name implies LeftRefill horizontally stitches reference and target views together as a whole input. The reference image occupies the left side while the target canvas is positioned on the right. Then LeftRefill paints the right-side target canvas based on the left-side reference and specific task instructions. Such a task formulation shares some similarities with contextual inpainting akin to the actions of a human painter. This novel formulation efficiently learns both structural and textured correspondence between reference and target without other image encoders or adapters. We inject task and view information through cross-attention modules in T2I models and further exhibit multi-view reference ability via the re-arranged self-attention modules. These enable LeftRefill to perform consistent generation as a generalized model without requiring test-time fine-tuning or model modifications. Thus LeftRefill can be seen as a simple yet unified framework to address reference-guided synthesis. As an exemplar we leverage LeftRefill to address two different challenges: reference-guided inpainting and novel view synthesis based on the pre-trained StableDiffusion. Codes and models are released at https://github.com/ewrfcas/LeftRefill.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenjie Cao , Yunuo Cai , Qiaole Dong , Yikai Wang , Yanwei Fu

Topics

Deep Learning > Architectures > Transformers Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation

Keywords

image inpainting diffusion model novel view synthesis text-to-image diffusion cross attention reference-guided synthesis

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024