MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

Gangjian Zhang; Nanjie Yao; Shunsi Zhang; Hanfeng Zhao; Guoliang Pang; Jian Shu; Hao Wang

2025 CVPR CVPR 2025

MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction

Abstract

This paper investigates the research task of reconstructing the 3D clothed human body from a monocular image. Due to the inherent ambiguity of single-view input, existing approaches leverage pre-trained SMPL(-X) estimation models or generative models to provide auxiliary information for human reconstruction. However, these methods capture only the general human body geometry and overlook specific geometric details, leading to inaccurate skeleton reconstruction, incorrect joint positions, and unclear cloth wrinkles. In response to these issues, we propose a multi-level geometry learning framework. Technically, we design three key components: skeleton-level enhancement, joint-level augmentation, and wrinkle-level refinement modules. Specifically, we effectively integrate the projected 3D Fourier features into a Gaussian reconstruction model, introduce perturbations to improve joint depth estimation during training, and refine the human coarse wrinkles by resembling the de-noising process of the diffusion model. Extensive quantitative and qualitative experiments on two test sets show the superior performance of our approach compared to state-of-the-art (SOTA) methods.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — gaussian reconstruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gangjian Zhang , Nanjie Yao , Shunsi Zhang , Hanfeng Zhao , Guoliang Pang , Jian Shu , Hao Wang

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Human Analysis

Keywords

3d reconstruction geometry learning depth estimation 3d human reconstruction diffusion model human body reconstruction gaussian reconstruction monocular reconstruction smpl estimation

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025