HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

Hyunjun Jung; Shun-Cheng Wu; Patrick Ruhkamp; Guangyao Zhai; Hannah Schieber; Giulia Rizzoli; Pengyuan Wang; Hongcheng Zhao; Lorenzo Garattoni; Sven Meier; Daniel Roth; Nassir Navab; Benjamin Busam

2024 CVPR CVPR 2024

HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

Abstract

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches research is shifting towards category-level pose estimation for practical applications. Current category-level datasets however fall short in annotation quality and pose variety. Addressing this we introduce HouseCat6D a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P) 2) encompasses 194 diverse objects across 10 household categories including two photometrically challenging ones and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage 5) a checkerboard-free environment and 6. dense 6D parallel-jaw robotic grasp annotations. Additionally we present benchmark results for leading category-level pose estimation networks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — robotic grasp annotation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hyunjun Jung , Shun-Cheng Wu , Patrick Ruhkamp , Guangyao Zhai , Hannah Schieber , Giulia Rizzoli , Pengyuan Wang , Hongcheng Zhao , Lorenzo Garattoni , Sven Meier , Daniel Roth , Nassir Navab , Benjamin Busam

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Detection Computer Vision > Domain-Specific > Autonomous Driving Computer Vision > Domain-Specific > Medical Imaging Artificial Intelligence > Core AI > Robotics Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Domain-Specific > Robotics

Keywords

object detection depth estimation multi-modal learning category-level pose estimation 6d pose estimation rgb-d imaging object pose rgbd imaging depth imaging category-level perception robotic grasp annotation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024