Semantics-aware Motion Retargeting with Vision-Language Models

Haodong Zhang; Zhike Chen; Haocheng Xu; Lei Hao; Xiaofei Wu; Songcen Xu; Zhensong Zhang; Yue Wang; Rong Xiong

2024 CVPR CVPR 2024

Semantics-aware Motion Retargeting with Vision-Language Models

Abstract

Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-language model with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics. Project page can be found at https://sites.google.com/view/smtnet.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haodong Zhang , Zhike Chen , Haocheng Xu , Lei Hao , Xiaofei Wu , Songcen Xu , Zhensong Zhang , Yue Wang , Rong Xiong

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Procedural Generation Artificial Intelligence > Learning Paradigms > Transfer Learning Artificial Intelligence > Core AI > Multi-Modal Learning Computer Vision > Domain-Specific > Computer Graphics Computer Vision > Generation > 3D Generation

Keywords

computer vision 3d vision motion capture semantic embedding vision-language model motion retargeting

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024