G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

Wei Chen; Xi Jia; Hyung Jin Chang; Jinming Duan; Ales Leonardis

2020 CVPR CVPR 2020

G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

Abstract

In this paper, we propose a novel real-time 6D object pose estimation framework, named G2L-Net. Our network operates on point clouds from RGB-D detection in a divide-and-conquer fashion. Specifically, our network consists of three steps. First, we extract the coarse object point cloud from the RGB-D image by 2D detection. Second, we feed the coarse object point cloud to a translation localization network to perform 3D segmentation and object translation prediction. Third, via the predicted segmentation and translation, we transfer the fine object point cloud into a local canonical coordinate, in which we train a rotation localization network to estimate initial object rotation. In the third step, we define point-wise embedding vector features to capture viewpoint-aware information. To calculate more accurate rotation, we adopt a rotation residual estimator to estimate the residual between initial rotation and ground truth, which can boost initial pose estimation performance. Our proposed G2L-Net is real-time despite the fact multiple steps are stacked via the proposed coarse-to-fine framework. Extensive experiments on two benchmark datasets show that G2L-Net achieves state-of-the-art performance in terms of both accuracy and speed.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Robotics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Wei Chen , Xi Jia , Hyung Jin Chang , Jinming Duan , Ales Leonardis

Topics

Computer Vision > Analysis > 3D Vision Robotics > Capabilities > Perception Artificial Intelligence > Core AI > Robotics Deep Learning > Learning Types > Multi-Task Learning

Keywords

pose estimation object detection point cloud depth estimation object tracking 3d object detection rgb-d image 6d pose estimation rotation estimation rgb-d perception

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020