Learning 3D Dynamic Scene Representations for Robot Manipulation

Zhenjia Xu; Zhanpeng He; Jiajun Wu; Shuran Song

2020 CORL CoRL 2020

Learning 3D Dynamic Scene Representations for Robot Manipulation

Abstract

3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Zhenjia Xu , Zhanpeng He , Jiajun Wu , Shuran Song

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Planning Computer Vision > Analysis > 3D Vision

Keywords

object tracking model predictive control robot manipulation 3d scene representation volumetric representation dynamics prediction

Download PDF

Related papers

Augmenting GAIL with BC for sample efficient imitation learning 2020

Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design 2020

LiRaNet: End-to-End Trajectory Prediction using Spatio-Temporal Radar Fusion 2020

DROGON: A Trajectory Prediction Model based on Intention-Conditioned Behavior Reasoning 2020

CAMPs: Learning Context-Specific Abstractions for Efficient Planning in Factored MDPs 2020