BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

Wenqian Zhang; Molin Huang; Yuxuan Zhou; Juze Zhang; Jingyi Yu; Jingya Wang; Lan Xu

2024 CVPR CVPR 2024

BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

Abstract

The recently emerging text-to-motion advances have spired numerous attempts for convenient and interactive human motion generation. Yet existing methods are largely limited to generating body motions only without considering the rich two-hand motions let alone handling various conditions like body dynamics or texts. To break the data bottleneck we propose BOTH57M a novel multi-modal dataset for two-hand motion generation. Our dataset includes accurate motion tracking for the human body and hands and provides pair-wised finger-level hand annotations and body descriptions. We further provide a strong baseline method BOTH2Hands for the novel task: generating vivid two-hand motions from both implicit body dynamics and explicit text prompts. We first warm up two parallel body-to-hand and text-to-hand diffusion models and then utilize the cross-attention transformer for motion blending. Extensive experiments and cross-validations demonstrate the effectiveness of our approach and dataset for generating convincing two-hand motions from the hybrid body-and-textual conditions. Our dataset and code will be disseminated to the community for future research which can be found at https://github.com/Godheritage/BOTH2Hands.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — 3d hand generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wenqian Zhang , Molin Huang , Yuxuan Zhou , Juze Zhang , Jingyi Yu , Jingya Wang , Lan Xu

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Video Generation Computer Vision > Analysis > Motion Analysis Artificial Intelligence > Core AI > Multi-Modal Learning Computer Vision > Processing > Motion Estimation Artificial Intelligence > Core AI > Motion Analysis

Keywords

multi-modal learning motion generation diffusion model motion tracking text-to-motion generation cross-attention transformer 3d hand generation body dynamics hand motion generation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024