Let Humanoids Hike! Integrative Skill Development on Complex Trails

Kwan-Yee Lin; Stella X. Yu

2025 CVPR CVPR 2025

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Abstract

Hiking on complex trails demands balance, agility, and adaptive decision-making over unpredictable terrain. Current humanoid research remains fragmented and inadequate for hiking: locomotion focuses on motor skills without long-term goals or situational awareness, while semantic navigation overlooks real-world embodiment and local terrain variability. We propose training humanoids to hike on complex trails, fostering integrative skill development across visual perception, decision making, and motor execution. We develop LEGO-H, a learning framework that enables a humanoid with vision to hike complex trails independently. It has two key innovations. (1) A Temporal Vision Transformer anticipates future steps to guide locomotion, unifying local movement and goal-directed navigation. (2) Latent representations of joint movement patterns combined with hierarchical metric learning allow smooth policy transfer from privileged training to real-world training. These techniques enable LEGO-H to handle diverse physical and environmental challenges without relying on predefined motion patterns. Experiments on diverse simulated hiking trails and humanoids with different morphologies demonstrate LEGO-H's robustness and versatility, establishing a strong foundation for future humanoid development.

🧭 Keyword Pioneer — temporal vision transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Reinforcement Learning, Robotics

Authors

Kwan-Yee Lin , Stella X. Yu

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Artificial Intelligence > Learning Paradigms > Transfer Learning

Keywords

policy transfer humanoid locomotion temporal vision transformer hierarchical metric learning integrative skill development

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025