Graph Stacked Hourglass Networks for 3D Human Pose Estimation

Tianhan Xu; Wataru Takano

2021 CVPR CVPR 2021

Graph Stacked Hourglass Networks for 3D Human Pose Estimation

Abstract

In this paper, we propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks. The proposed architecture consists of repeated encoder-decoder, in which graph-structured features are processed across three different scales of human skeletal representations. This multi-scale architecture enables the model to learn both local and global feature representations, which are critical for 3D human pose estimation. We also introduce a multi-level feature learning approach using different-depth intermediate features and show the performance improvements that result from exploiting multi-scale, multi-level feature representations. Extensive experiments are conducted to validate our approach, and the results show that our model outperforms the state-of-the-art.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — human skeletal representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tianhan Xu , Wataru Takano

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Graph Neural Networks Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Human Pose Estimation Artificial Intelligence > Core AI > Computer Vision

Keywords

human pose estimation 3d human pose estimation multi-scale feature 3d pose encoder-decoder architecture graph convolutional network multi-scale learning graph neural network multi-scale feature learning human skeletal representation skeletal representation

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021