Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

Xiaogang Peng; Siyuan Mao; Zizhao Wu

2023 CVPR CVPR 2023

Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

Abstract

Multi-person pose forecasting remains a challenging problem, especially in modeling fine-grained human body interaction in complex crowd scenarios. Existing methods typically represent the whole pose sequence as a temporal series, yet overlook interactive influences among people based on skeletal body parts. In this paper, we propose a novel Trajectory-Aware Body Interaction Transformer (TBIFormer) for multi-person pose forecasting via effectively modeling body part interactions. Specifically, we construct a Temporal Body Partition Module that transforms all the pose sequences into a Multi-Person Body-Part sequence to retain spatial and temporal information based on body semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA) module, utilizing the transformed sequence to learn body part dynamics for inter- and intra-individual interactions. Furthermore, different from prior Euclidean distance-based spatial encodings, we present a novel and efficient Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative spatial information and additional interactive clues. On both short- and long-term horizons, we empirically evaluate our framework on CMU-Mocap, MuPoTS-3D as well as synthesized datasets (6 10 persons), and demonstrate that our method greatly outperforms the state-of-the-art methods.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — multi-person pose forecasting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaogang Peng , Siyuan Mao , Zizhao Wu

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Transformers Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Human Pose Estimation Computer Vision > Analysis > Motion Analysis Computer Vision > Analysis > Pose Estimation

Keywords

transformer architecture action recognition motion estimation human pose estimation human motion prediction multi-person tracking multi-person pose pose forecasting pose sequence modeling multi-person pose forecasting body interaction modeling body part interaction

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023