SIAM: Synchronous Interaction Attention for Human Mesh Recovery
Abstract
Conventional 3D body-mesh reconstruction methods often use decoupling strategies that isolate individual features for separate representation and therefore lack relational cues among entities. In this paper, a novel Synchronous Interaction Attention Module (SIAM) for human-mesh recovery is proposed. The framework builds on a high-resolution multi-branch backbone (HRNet) and introduces two key components. First, Synchronous Interaction Attention (SIA), which explicitly models spatial relational cues among multiple human instances in live scenes. Second, Feature Decomposition (FD), which extracts enriched instance-specific features by leveraging the attributes captured by the SIA module. This integrated approach significantly enhances spatial reasoning, mitigates error accumulation, and yields more accurate 3D human-mesh reconstructions. SIAM achieves state-of-the-art performance on several benchmarks, including 3DPW, 3DPW-OCC, AGORA, and CMU-Panoptic. Notably, the model processes video streams at 25 frames per second, demonstrating its suitability for real-time applications. The source code and supplementary materials are publicly available.