2026 AAAI AAAI 2026

SceneGenesis: 3D Scene Synthesis via Semantic Structural Priors and Mesh-Guided Video-Geometry Fusion

Abstract

Abstract Generating high-quality, controllable, and structurally consistent 3D scenes in complex multi-object environments remains a fundamental challenge. We present SceneGenesis, a unified framework that synthesizes 3D scenes by combining semantic structural priors with mesh-guided video–geometry fusion. SceneGenesis first employs large language models to convert textual descriptions into category-aware object specifications, which are transformed into structured meshes using procedural approximations and pretrained asset generators, enabling precise layout control and scalable scene construction. To obtain rich and style-controllable appearances, SceneGenesis generates multi-view video representations conditioned on the initialized structure. A mesh-guided video–geometry fusion module then consolidates video evidence with mesh priors through mesh-conditioned fragment initialization, progressive geometric refinement, and structure-aware optimization, substantially improving global geometric fidelity and visual realism. Experiments demonstrate that SceneGenesis supports flexible style variation and object-level editing while achieving strong controllability, scalability, and structural quality.

🧭 Keyword Pioneer — semantic structural prior
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics