KMOPS: Keypoint-Driven Method for Multi-Object Pose and Metric Size Estimation from Stereo Images
Abstract
The six-degree-of-freedom (6-DoF) pose and metric size estimation of multiple objects from RGB images alone remains a challenging task, particularly due to significant variations in object shape, appearance, and frequent occlusions in complex scenes. To address these challenges, we introduce KMOPS, a Keypoint-driven method tailored specifically for Multi-Object Pose and metric Size estimation from a single calibrated stereo image pair. Leveraging the stereo input, our approach first extracts the 2D keypoints of the enclosing bounding boxes of the objects across both views, and subsequently triangulates them to acquire metric 3D positions. Then, we obtain each object's rotation, translation, and dimensions by aligning the triangulated 3D keypoints to the canonical ones using a closed-form solution. Our formulation eliminates the need for predefined 3D search spaces or volumetric anchors, which are often required by other methods to constrain the vast 3D solution space. With extensive experiments on the challenging dataset Transparent Object Dataset (TOD) and StereOBJ-1M, we show that our method outperforms all competing methods with a simple and effective architecture.