2026 WACV WACV 2026

BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Abstract

Text-to-motion generation allows language-driven animation, yet current models struggle to deliver long-range coherence and fine-grained limb coordination. A competitive system must (i) preserve temporal consistency across hundreds of frames, (ii) synchronize limb motions, and (iii) align nuanced sentences with a spectrum of plausible trajectories. We introduce BiPO, the first part-based bidirectional autoregressive network trained with a lightweight Partial Occlusion regulariser. Each limb attends to both past and future frames for anticipatory coordination, while stochastic masking weakens spurious cross-part dependencies and encourages varied solutions. On HumanML3D and KIT-ML, BiPO lowers FID by 15-30% relative to MoMask and BAMM, secures the highest human-perceived realism scores, and sets new state-of-the-art results on motion-editing tasks requiring infill from partial sequences. These findings demonstrate that bidirectional reasoning coupled with Partial Occlusion yields a length-agnostic, high-fidelity framework for expressive, language-conditioned motion synthesis.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision
🧭 Keyword Pioneer — bidirectional autoregressive
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio