2025 ICCV ICCV 2025

Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction

Abstract

Multi-person motion prediction becomes particularly challenging when handling highly interactive scenarios involving extreme motions. Previous works focused more on the case of `moderate' motions (e.g., walking together), where predicting each pose in isolation often yields reasonable results. However, these approaches fall short in modeling extreme motions like lindy-hop dances, as they require a more comprehensive understanding of cross-person dependencies. To bridge this gap, we introduce Proxy-bridged Game Transformer (PGformer), a Transformer-based foundation model that captures the interactions driving extreme multi-person motions. PGformer incorporates a novel cross-query attention module to learn bidirectional dependencies between pose sequences and a proxy unit that subtly controls bidirectional spatial information flow. We evaluated PGformer on the challenging ExPI dataset, which involves large collaborative movements. Both quantitative and qualitative results demonstrate the superiority of PGformer in both short- and long-term predictions. We also test the proposed method on moderate movement datasets CMU-Mocap and MuPoTS-3D, generalizing PGformer to scenarios with more than two individuals with promising results. Code of PGformer is available at https://github.com/joyfang1106/pgformer.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning
🧭 Keyword Pioneer — extreme motion
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio