ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang; Ziang Cheng; Yunfei Duan; Pan Ji; Hongdong Li

2024 CVPR CVPR 2024

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Abstract

Given a single image of a 3D object this paper proposes a novel method (named ConsistNet) that can generate multiple images of the same object as if they are captured from different viewpoints while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited. Central to our method is a lightweight multi-view consistency block that enables information exchange across multiple single-view diffusion processes based on the underlying multi-view geometry principles. ConsistNet is an extension to the standard latent diffusion model and it consists of two submodules: (a) a view aggregation module that unprojects multi-view features into global 3D volumes and infers consistency and (b) a ray aggregation module that samples and aggregates 3D consistent features back to each view to enforce consistency. Our approach departs from previous methods in multi-view image generation in that it can be easily dropped in pre-trained LDMs without requiring explicit pixel correspondences or depth prediction. Experiments show that our method effectively learns 3D consistency over a frozen Zero123-XL backbone and can generate 16 surrounding views of the object within 11 seconds on a single A100 GPU.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐣 Hot Topic Early Bird — multi-view generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiayu Yang , Ziang Cheng , Yunfei Duan , Pan Ji , Hongdong Li

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation

Keywords

diffusion model latent diffusion latent diffusion model view synthesis multi-view generation 3d consistency 3d volume multi-view image generation view aggregation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024