Explore to Learn: Latent Exploration Through Disentangled Synergy Patterns for Reinforcement Learning in Overactuated Control

Yiming Wang; Kaiyan Zhao; Xu Li; Yan Li; Jiayu Chen; Steven Morad; Leong Hou U

2026 AAAI AAAI 2026

Explore to Learn: Latent Exploration Through Disentangled Synergy Patterns for Reinforcement Learning in Overactuated Control

Abstract

Abstract Control in high-dimensional action spaces remains a fundamental challenge in reinforcement learning (RL), primarily due to inefficient exploration of the action space. While recent methods attempt to guide exploration, they often fall short of achieving the agility and coordination exhibited in biological motor control. Inspired by how organisms exploit muscle synergies for efficient movement, we propose Explore to Learn (ETL), a two-stage framework that first discovers fundamental synergy patterns and then leverages them for task-specific policy learning. In the first stage, ETL discovers underlying synergy patterns by deploying a targeted exploration policy. These patterns are modeled as latent directions in a low-dimensional space, along which the agent is guided to collect diverse and structured muscle activation trajectories. A variational autoencoder (VAE) is then trained to encode high-dimensional actions into a latent space whose dimensions correspond to the synergy patterns. In the second stage, the policy is trained entirely in this synergy-aware latent space, producing synergy coefficients that the decoder maps back to full-dimensional muscle actions. This structured representation significantly reduces the complexity of learning, while the decoder is further fine-tuned to enhance expressiveness and generalization across downstream tasks. Extensive experiments across musculoskeletal environments and the DMControl suite demonstrate that ETL consistently outperforms prior methods in both exploration efficiency and control performance, achieving superior scalability and generalization in overactuated control tasks.

🧭 Keyword Pioneer — overactuated control

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yiming Wang , Kaiyan Zhao , Xu Li , Yan Li , Jiayu Chen , Steven Morad , Leong Hou U

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

reinforcement learning variational autoencoder action space reduction muscle synergy overactuated control

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026