SynCity: Training-Free Generation of 3D Worlds

Paul Engstler; Aleksandar Shtedritski; Iro Laina; Christian Rupprecht; Andrea Vedaldi

2025 ICCV ICCV 2025

SynCity: Training-Free Generation of 3D Worlds

Abstract

We propose SynCity, a method for generating explorable 3D worlds from textual descriptions. Our approach leverages pre-trained textual, image, and 3D generators without requiring fine-tuning or inference-time optimization. While most 3D generators are object-centric and unable to create large-scale worlds, we demonstrate how 2D and 3D generators can be combined to produce ever-expanding scenes. The world is generated tile by tile, with each new tile created within its context and seamlessly integrated into the scene. SynCity enables fine-grained control over the appearance and layout of the generated worlds, which are both detailed and diverse.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — 3d world generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Paul Engstler , Aleksandar Shtedritski , Iro Laina , Christian Rupprecht , Andrea Vedaldi

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Procedural Generation Deep Learning > Models > Generative Models Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation

Keywords

text-to-image generation generative model scene generation 3d world generation tile-based generation training-free generation 3d generator

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025