MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting

Jose Ribeiro-Gomes; Tianhui Cai; Zoltán Á. Milacski; Chen Wu; Aayush Prakash; Shingo Takagi; Amaury Aubel; Daeil Kim; Alexandre Bernardino; Fernando De la Torre

2024 WACV WACV 2024

MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting

Abstract

There are numerous applications for human motion synthesis, including animation, gaming, robotics, or sports science. In recent years, human motion generation from natural language has emerged as a promising alternative to costly and labor-intensive data collection methods relying on motion capture or wearable sensors (e.g., suits). Despite this, generating human motion from textual descriptions remains a challenging and intricate task, primarily due to the scarcity of large-scale supervised datasets capable of capturing the full diversity of human activity. This study proposes a new approach, called MotionGPT, to address the limitations of previous text-based human motion generation methods by utilizing the extensive semantic information available in large language models (LLMs). We first pretrain a doubly text-conditional motion diffusion model on both coarse ("high-level") and detailed ("low-level") ground truth text data. Then during inference, we improve motion diversity and alignment with the training set, by zero-shot prompting GPT-3 for additional "low-level" details. Our method achieves new state-of-the-art quantitative results in terms of Frechet Inception Distance (FID) and motion diversity metrics, and improves all considered metrics. Furthermore, it has strong qualitative performance, producing natural results.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jose Ribeiro-Gomes , Tianhui Cai , Zoltán Á. Milacski , Chen Wu , Aayush Prakash , Shingo Takagi , Amaury Aubel , Daeil Kim , Alexandre Bernardino , Fernando De la Torre

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Zero-Shot Learning

Keywords

human motion synthesis zero-shot prompting motion diffusion model text-to-motion generation large language model

Download PDF

Uncertainty-Weighted Loss Functions for Improved Adversarial Attacks on Semantic Segmentation 2024

Training-Free Content Injection Using H-Space in Diffusion Models 2024

Self-Annotated 3D Geometric Learning for Smeared Points Removal 2024

CamoFocus: Enhancing Camouflage Object Detection With Split-Feature Focal Modulation and Context Refinement 2024

MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting

Abstract

Authors

Topics

Keywords

Related papers