GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation

Phillip Mueller; Talip Uenlue; Sebastian Schmidt; Marcel Kollovieh; Jiajie Fan; Stephan Günnemann; Lars Mikelsons

2025 ICCV ICCV 2025

GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation

Abstract

Precise geometric control in image generation is essential for fields like engineering & product design and creative industries to control 3D object features accurately in 2D image space. Traditional 3D editing approaches are time-consuming and demand specialized skills, while current image-based generative methods lack accuracy in geometric conditioning. To address these challenges, we propose GeoDiffusion, a training-free framework for accurate and efficient geometric conditioning of 3D features in image generation. GeoDiffusion employs a class-specific 3D object as a geometric prior to define keypoints and parametric correlations in 3D space. We ensure viewpoint consistency through a rendered image of a reference 3D object, followed by style transfer to meet user-defined appearance specifications. At the core of our framework is GeoDrag, improving accuracy and speed of drag-based image editing on geometry guidance tasks and general instructions on DragBench. Our results demonstrate that GeoDiffusion enables precise geometric modifications across various iterative design workflows.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — 3d control

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Phillip Mueller , Talip Uenlue , Sebastian Schmidt , Marcel Kollovieh , Jiajie Fan , Stephan Günnemann , Lars Mikelsons

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing

Keywords

image generation style transfer image editing diffusion model viewpoint consistency drag-based editing geometric conditioning 3d control training-free framework 3d image generation

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025