RoHM: Robust Human Motion Reconstruction via Diffusion

Siwei Zhang; Bharat Lal Bhatnagar; Yuanlu Xu; Alexander Winkler; Petr Kadlecek; Siyu Tang; Federica Bogo

2024 CVPR CVPR 2024

RoHM: Robust Human Motion Reconstruction via Diffusion

Abstract

We propose RoHM an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. RoHM is a novel diffusion-based motion model that conditioned on noisy and occluded input data reconstructs complete plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models one for global trajectory and one for local motion. To capture the correlations between the two we then introduce a novel conditioning module combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — motion denoising

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siwei Zhang , Bharat Lal Bhatnagar , Yuanlu Xu , Alexander Winkler , Petr Kadlecek , Siyu Tang , Federica Bogo

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision

Keywords

diffusion model monocular video 3d motion capture human motion reconstruction motion denoising diffusion-based motion model global trajectory motion infilling

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024