HyperPose: Hyper-pose Embeddings for 3D-Aware Generative Models with Self-Supervised Disentangling of Pose and Scene
Abstract
We propose a novel framework for training 3D-aware Generative Adversarial Networks (GANs) from a collection of 2D images, effectively learning both image distribution and 3D geometric configurations without relying on strong 3D priors such as camera poses, depth information, or target-specific 3D models. To achieve these goals, we introduce hyper-pose embeddings alongside a novel pose disentanglement technique that effectively separates pose and scene information. This crucial disentanglement helps the generative model overcome the inherent conflict between learning photorealism and accurate 3D geometry. Furthermore, we propose soft contrastive learning to robustly handle the continuous nature of camera poses and a non-match loss to further enhance disentanglement and refine embedding training. Experiments on challenging datasets demonstrate the effectiveness of our method in 3D-aware image synthesis, particularly for scenes with complex or diverse objects.