2025 ICCV ICCV 2025

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

Abstract

Synthesizing realistic human-object interaction motions is a critical problem in VR/AR and human animation. Unlike the commonly studied scenarios involving a single human or hand interacting with one object, we address a more generic multi-body setting with arbitrary numbers of humans, hands, and objects. The high correlations and mutual influences among bodies leads to two major challenges, for which we propose solutions. First, to satisfy the high demands for synchronization of different body motions, we mathematically derive a new set of alignment scores during the training process, and use maximum likelihood sampling on a dynamic graphical model for explicit synchronization during inference. Second, the high-frequency interactions between objects are often overshadowed by the large-scale low-frequency movements. To address this, we introduce frequency decomposition and explicitly represent high-frequency components in the frequency domain. Extensive experiments across five datasets with various multi-body configurations demonstrate the superiority of SyncDiff over existing state-of-the-art motion synthesis methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio