Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Yevgen Chebotar; Quan Vuong; Karol Hausman; Fei Xia; Yao Lu; Alex Irpan; Aviral Kumar; Tianhe Yu; Alexander Herzog; Karl Pertsch; Keerthana Gopalakrishnan; Julian Ibarz; Ofir Nachum; Sumedh Anand Sontakke; Grecia Salazar; Huong T. Tran; Jodilyn Peralta; Clayton Tan; Deeksha Manjunath; Jaspiar Singh; Brianna Zitkovich; Tomas Jackson; Kanishka Rao; Chelsea Finn; Sergey Levine

2023 CORL CoRL 2023

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Abstract

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite.

👥 Mega-Team — 25 authors

🧭 Keyword Pioneer — autoregressive q-function

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yevgen Chebotar , Quan Vuong , Karol Hausman , Fei Xia , Yao Lu , Alex Irpan , Aviral Kumar , Tianhe Yu , Alexander Herzog , Karl Pertsch , Keerthana Gopalakrishnan , Julian Ibarz , Ofir Nachum , Sumedh Anand Sontakke , Grecia Salazar , Huong T. Tran , Jodilyn Peralta , Clayton Tan , Deeksha Manjunath , Jaspiar Singh , Brianna Zitkovich , Tomas Jackson , Kanishka Rao , Chelsea Finn , Sergey Levine

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Applications > Robotics

Keywords

multi-task learning offline reinforcement learning robotic manipulation autoregressive q-function

Download PDF

Related papers

Stochastic Occupancy Grid Map Prediction in Dynamic Scenes 2023

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning 2023

Robot Parkour Learning 2023

Task-Oriented Koopman-Based Control with Contrastive Encoder 2023

Language-Guided Traffic Simulation via Scene-Level Diffusion 2023