PointInfinity: Resolution-Invariant Point Diffusion Models

Zixuan Huang; Justin Johnson; Shoubhik Debnath; James M. Rehg; Chao-Yuan Wu

2024 CVPR CVPR 2024

PointInfinity: Resolution-Invariant Point Diffusion Models

Abstract

We present PointInfinity an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size resolution-invariant latent representation. This enables efficient training with low-resolution point clouds while allowing high-resolution point clouds to be generated during inference. More importantly we show that scaling the test-time resolution beyond the training resolution improves the fidelity of generated point clouds and surfaces. We analyze this phenomenon and draw a link to classifier-free guidance commonly used in diffusion models demonstrating that both allow trading off fidelity and variability during inference. Experiments on CO3D show that PointInfinity can efficiently generate high-resolution point clouds (up to 131k points 31 times more than Point-E) with state-of-the-art quality.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — resolution scaling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zixuan Huang , Justin Johnson , Shoubhik Debnath , James M. Rehg , Chao-Yuan Wu

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation

Keywords

transformer architecture point cloud generation point cloud diffusion model latent representation classifier-free guidance resolution scaling

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024