2026 AAAI AAAI 2026

Placing Any Object at Any 3D Position

Abstract

Abstract In this work, we propose a diffusion-based method for 3D-aware image composition. Previous approaches have focused on 2D-view image composition, which limits their handling of complex 3D spatial relationships. Consequently, they are not well-suited for applications requiring precise 3D object control and iterative refinement, including interior design visualization, visual effects prototyping, and virtual reality scene construction. In contrast, our method extracts 3D bounding boxes for all objects in the scene image. Users can then specify a new 3D bounding box based on existing spatial context and provide an image of the target object. Leveraging a fine-tuned diffusion model, our approach enables high-fidelity image composition while preserving the underlying 3D structure of the scene.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio