Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction

Yanwen Fang; Wenqi Jia; Xu Cao; Peng-tao Jiang; Guodong Li; Jintai Chen

2025 ICCV ICCV 2025

Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction

Abstract

Multi-person motion prediction becomes particularly challenging when handling highly interactive scenarios involving extreme motions. Previous works focused more on the case of `moderate' motions (e.g., walking together), where predicting each pose in isolation often yields reasonable results. However, these approaches fall short in modeling extreme motions like lindy-hop dances, as they require a more comprehensive understanding of cross-person dependencies. To bridge this gap, we introduce Proxy-bridged Game Transformer (PGformer), a Transformer-based foundation model that captures the interactions driving extreme multi-person motions. PGformer incorporates a novel cross-query attention module to learn bidirectional dependencies between pose sequences and a proxy unit that subtly controls bidirectional spatial information flow. We evaluated PGformer on the challenging ExPI dataset, which involves large collaborative movements. Both quantitative and qualitative results demonstrate the superiority of PGformer in both short- and long-term predictions. We also test the proposed method on moderate movement datasets CMU-Mocap and MuPoTS-3D, generalizing PGformer to scenarios with more than two individuals with promising results. Code of PGformer is available at https://github.com/joyfang1106/pgformer.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — extreme motion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yanwen Fang , Wenqi Jia , Xu Cao , Peng-tao Jiang , Guodong Li , Jintai Chen

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Deep Learning > Architectures > Transformers Computer Vision > Analysis > Action Recognition Computer Vision > Analysis > Motion Analysis

Keywords

transformer architecture pose sequence motion prediction multi-person interaction extreme motion cross-person dependency game transformer

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025