RGB2Point: 3D Point Cloud Generation from Single RGB Images

Jae Joong Lee; Bedrich Benes

2025 WACV WACV 2025

RGB2Point: 3D Point Cloud Generation from Single RGB Images

Abstract

We introduce RGB2Point an unposed single-view RGB image to a 3D point cloud generation based on Transformer. RGB2Point takes an input image of an object and generates a dense 3D point cloud. Contrary to prior works based on CNN layers and diffusion-denoising approaches we use pre-trained Transformer layers that are fast and generate high-quality point clouds with consistent quality over available categories. Our generated point clouds demonstrate high quality on a real-world dataset as evidenced by improved Chamfer distance (51.15%) and Earth Mover's distance (36.17%) metrics compared to the current state-of-the-art. Additionally our approach shows a better quality on a synthetic dataset achieving better Chamfer distance (39.26%) Earth Mover's distance (26.95%) and F-score (47.16%). Moreover our method produces 63.1% more consistent high-quality results across various object categories compared to prior works. Furthermore RGB2Point is computationally efficient requiring only 2.3GB of VRAM to reconstruct a 3D point cloud from a single RGB image and our implementation generates the results 15133x faster than a SOTA diffusion-based model.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jae Joong Lee , Bedrich Benes

Topics

Deep Learning > Architectures > Transformers Computer Vision > Analysis > 3D Vision

Keywords

point cloud generation 3d vision chamfer distance single-view reconstruction

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025