TryOnDiffusion: A Tale of Two UNets

Luyang Zhu; Dawei Yang; Tyler Zhu; Fitsum Reda; William Chan; Chitwan Saharia; Mohammad Norouzi; Ira Kemelmacher-Shlizerman

2023 CVPR CVPR 2023

TryOnDiffusion: A Tale of Two UNets

Abstract

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — person visualization

🐣 Hot Topic Early Bird — virtual try-on

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Luyang Zhu , Dawei Yang , Tyler Zhu , Fitsum Reda , William Chan , Chitwan Saharia , Mohammad Norouzi , Ira Kemelmacher-Shlizerman

Topics

Deep Learning > Architectures > Neural Networks Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation

Keywords

image generation diffusion model virtual try-on pose transfer image warping cross attention unet architecture person visualization

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023