2026 AAAI AAAI 2026

Placing Any Object at Any 3D Position

Abstract

Abstract In this work, we propose a diffusion-based method for 3D-aware image composition. Previous approaches have focused on 2D-view image composition, which limits their handling of complex 3D spatial relationships. Consequently, they are not well-suited for applications requiring precise 3D object control and iterative refinement, including interior design visualization, visual effects prototyping, and virtual reality scene construction. In contrast, our method extracts 3D bounding boxes for all objects in the scene image. Users can then specify a new 3D bounding box based on existing spatial context and provide an image of the target object. Leveraging a fine-tuned diffusion model, our approach enables high-fidelity image composition while preserving the underlying 3D structure of the scene.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Junhao Zhang , Ming Kong , Zhanbin Hu , Hao Qin , ZHIJIE XU , Xiaojun Zhu , Qiang Zhu

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation

Keywords

scene understanding 3d vision diffusion model image composition 3d object placement

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026