Dense Policy: Bidirectional Autoregressive Learning of Actions

Yue Su; Xinyu Zhan; Hongjie Fang; Han Xue; Hao-Shu Fang; Yong-Lu Li; Cewu Lu; Lixin Yang

2025 ICCV ICCV 2025

Dense Policy: Bidirectional Autoregressive Learning of Actions

Abstract

Mainstream visuomotor policies predominantly rely on generative models for holistic action prediction, while current autoregressive policies, predicting the next token or chunk, have shown suboptimal results. This motivates a search for more effective learning methods to unleash the potential of autoregressive policies for robotic manipulation. This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner with logarithmic-time inference. Extensive experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies. Our policy, example data, and training code will be publicly available upon publication.

🌉 Interdisciplinary Bridge — Deep Learning and Reinforcement Learning and Robotics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Yue Su , Xinyu Zhan , Hongjie Fang , Han Xue , Hao-Shu Fang , Yong-Lu Li , Cewu Lu , Lixin Yang

Topics

Deep Learning > Architectures > Transformers Reinforcement Learning > Applications > Robotics Robotics > Capabilities > Manipulation Deep Learning > Learning Types > Reinforcement Learning

Keywords

robotic manipulation action prediction autoregressive policy visuomotor policy bidirectional learning autoregressive learning

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025