SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen; Yan Di; Guangyao Zhai; Fabian Manhardt; Chenyangguang Zhang; Ruida Zhang; Federico Tombari; Nassir Navab; Benjamin Busam

2024 CVPR CVPR 2024

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Abstract

Category-level object pose estimation aiming to predict the 6D pose and 3D size of objects from known categories typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of capturing this variation. To address this issue we present SecondPose a novel approach integrating object-specific geometric features with semantic category priors from DINOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations facilitating the mapping from camera space to the pre-defined canonical space thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover on a more complex dataset HouseCat6D which provides photometrically challenging objects SecondPose still surpasses other competitors by a large margin. Code is released at https://github.com/NOrangeeroli/SecondPose.git.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning and Robotics

🧭 Keyword Pioneer — category-level object

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yamei Chen , Yan Di , Guangyao Zhai , Fabian Manhardt , Chenyangguang Zhang , Ruida Zhang , Federico Tombari , Nassir Navab , Benjamin Busam

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > 3D Vision Robotics > Capabilities > Perception Artificial Intelligence > Core AI > Robotics Artificial Intelligence > Core AI > Computer Vision Computer Vision > Analysis > Pose Estimation

Keywords

pose estimation feature fusion object pose estimation 6d pose estimation semantic feature geometric feature category-level pose category-level recognition 6d pose category-level object dual-stream fusion

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024