PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li; Wangguandong Zheng; Yuan Liu; Tao Yu; Yangguang Li; Xingqun Qi; Xiaowei Chi; Siyu Xia; Yan-Pei Cao; Wei Xue; Wenhan Luo; Yike Guo

2025 CVPR CVPR 2025

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Abstract

Photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, existing methods for monocular full-body reconstruction, typically relying on front and/or predicted back view, still struggle with satisfactory performance due to the ill-posed nature of the problem and sophisticated self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling identity-preserved novel-view generation without geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models (SMPL-X), which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multiview normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experiments on CAPE and THuman2.1 demonstrate PSHuman's superiority in geometry details, texture fidelity, and generalization capability.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peng Li , Wangguandong Zheng , Yuan Liu , Tao Yu , Yangguang Li , Xingqun Qi , Xiaowei Chi , Siyu Xia , Yan-Pei Cao , Wei Xue , Wenhan Luo , Yike Guo

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Procedural Generation Deep Learning > Models > Diffusion Models Computer Vision > Domain-Specific > Computer Graphics Computer Vision > Generation > 3D Generation

Keywords

3d reconstruction neural rendering diffusion model photorealistic rendering mesh reconstruction multi-view synthesis human mesh multiview generation

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025