OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maxim Kolodiazhnyi; Anna Vorontsova; Anton Konushin; Danila Rukhovich

2024 CVPR CVPR 2024

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Abstract

Semantic instance and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified simple and effective model addressing all these tasks jointly. The model named OneFormer3D performs instance and semantic segmentation consistently using a group of learnable kernels where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run so that it achieves top performance on all three segmentation tasks simultaneously. Specifically our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic instance and panoptic segmentation of ScanNet (+21 PQ) ScanNet200 (+3.8 mAP50) and S3DIS (+0.8 mIoU) datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maxim Kolodiazhnyi , Anna Vorontsova , Anton Konushin , Danila Rukhovich

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Image Segmentation Computer Vision > Processing > Semantic Segmentation

Keywords

semantic segmentation point cloud 3d vision instance segmentation transformer decoder mask generation panoptic segmentation point cloud segmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024