AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Jieming Cui; Tengyu Liu; Nian Liu; Yaodong Yang; Yixin Zhu; Siyuan Huang

2024 CVPR CVPR 2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Abstract

Traditional approaches in physics-based motion generation centered around imitation learning and reward shaping often struggle to adapt to new scenarios. To tackle this limitation we propose AnySkill a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning. Upon receiving an open-vocabulary textual instruction AnySkill employs a high-level policy that selects and integrates these atomic actions to maximize the CLIP similarity between the agent's rendered images and the text. An important feature of our method is the use of image-based rewards for the high-level policy which allows the agent to learn interactions with objects without manual reward engineering. We demonstrate AnySkill's capability to generate realistic and natural motion sequences in response to unseen instructions of varying lengths marking it the first method capable of open-vocabulary physical skill learning for interactive humanoid agents.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — physical skill

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Jieming Cui , Tengyu Liu , Nian Liu , Yaodong Yang , Yixin Zhu , Siyuan Huang

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Applications > Robotics

Keywords

imitation learning hierarchical policy physical skill open-vocabulary instruction image-based reward

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024