Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

Haixin Shi; Yinlin Hu; Daniel Koguciuk; Juan-Ting Lin; Mathieu Salzmann; David Ferstl

2025 AAAI AAAI 2025

Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

Abstract

Abstract We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haixin Shi , Yinlin Hu , Daniel Koguciuk , Juan-Ting Lin , Mathieu Salzmann , David Ferstl

Topics

Machine Learning > Optimization & Theory > Optimization Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Object Tracking Artificial Intelligence > Core AI > Robotics

Keywords

3d reconstruction pose estimation neural representation implicit neural representation monocular video object reconstruction virtual camera

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025