Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao; Jialiang Zhu; CHUNYU WANG; Han Hu; Steven L. Waslander

2024 CVPR CVPR 2024

Multiple View Geometry Transformers for 3D Human Pose Estimation

Abstract

In this work we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs which struggle to resolve geometric information accurately particularly during occlusion. Instead we propose a novel hybrid model MVGFormer which has a series of geometric and appearance modules organized in an iterative manner. The geometry modules are learning-free and handle all viewpoint-dependent 3D tasks geometrically which notably improves the model's generalization ability. The appearance modules are learnable and are dedicated to estimating 2D poses from image signals end-to-end which enables them to achieve accurate estimates even when occlusion occurs leading to a model that is both accurate and generalizable to new cameras and geometries. We evaluate our approach for both in-domain and out-of-domain settings where our model consistently outperforms state-of-the-art methods and especially does so by a significant margin in the out-of-domain setting. We will release the code and models: https://github.com/XunshanMan/MVGFormer.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ziwei Liao , Jialiang Zhu , CHUNYU WANG , Han Hu , Steven L. Waslander

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Human Pose Estimation

Keywords

occlusion handling 3d human pose estimation out-of-domain generalization multi-view geometry

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024