MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Xu He; Zhiyong Wu; Xiaoyu Li; Di Kang; Chaopeng Zhang; Jiangnan Ye; Liyang Chen; Xiangjun Gao; Han Zhang; Haolin Zhuang

2025 AAAI AAAI 2025

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Abstract

Abstract Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model to generate high-quality novel views from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To maintain consistency while generating denser views for improved 3D human reconstruction, we introduce hybrid multi-view attention to facilitate efficient and thorough information interchange across views. Besides, we present a geometry-aware dual branch to perform concurrent generation in both RGB and normal domains, further enhancing consistency via geometry cues. Last but not least, to address ill-shaped issues arising from inaccurate SMPL-X estimation, we propose a novel iterative refinement strategy, which progressively optimizes SMPL-X accuracy while enhancing the quality and consistency of the generated multi-views. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in both novel view synthesis and subsequent 3D human reconstruction tasks.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xu He , Zhiyong Wu , Xiaoyu Li , Di Kang , Chaopeng Zhang , Jiangnan Ye , Liyang Chen , Xiangjun Gao , Han Zhang , Haolin Zhuang

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation

Keywords

3d human reconstruction iterative refinement diffusion model novel view synthesis multi-view generation smpl-x model multi-view attention

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025