Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion

Xiao Han; Xiatian Zhu; Jiankang Deng; Yi-Zhe Song; Tao Xiang

2023 ICCV ICCV 2023

Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion

Abstract

Controllable person image synthesis aims at rendering a source image based on user-specified changes in body pose or appearance. Prior art approaches leverage pixel-level denoising diffusion models conditioned on the coarse skeleton via cross-attention. This leads to two limitations: low efficiency and inaccurate condition information. To address both issues, a novel Pose-Constrained Latent Diffusion model (PoCoLD) is introduced. Rather than using the skeleton as a sparse pose representation, we exploit DensePose which offers much richer body structure information. To effectively capitalize DensePose at a low cost, we propose an efficient pose-constrained attention module that is capable of modeling the complex interplay between appearance and pose. Extensive experiments show that our PoCoLD outperforms the state-of-the-art competitors in image synthesis fidelity. Critically, it runs 2x faster and consumes 3.6x smaller memory than the latest diffusion-model-based alternative during inference.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiao Han , Xiatian Zhu , Jiankang Deng , Yi-Zhe Song , Tao Xiang

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > Human Pose Estimation Computer Vision > Generation > Image Generation

Keywords

image generation latent diffusion pose transfer person image synthesis

Download PDF

Related papers

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework 2023

Periodically Exchange Teacher-Student for Source-Free Object Detection 2023

Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations 2023

Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles 2023

3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation 2023