Hand-held Object Reconstruction from RGB Video with Dynamic Interaction

Shijian Jiang; Qi Ye; Rengan Xie; Yuchi Huo; Jiming Chen

2025 CVPR CVPR 2025

Hand-held Object Reconstruction from RGB Video with Dynamic Interaction

Abstract

This work aims to reconstruct the 3D geometry of a rigid object manipulated by one or both hands using monocular RGB video. Previous methods rely on Structure-from-Motion or hand priors to estimate relative motion between the object and camera, which typically assume textured objects or single-hand interactions. To accurately recover object geometry in dynamic interactions, we incorporate priors from 3D generation model into object pose estimation and propose semantic consistency constraints to solve the challenge of shape and texture discrepancy between the generated priors and observations. The poses are initialized, followed by joint optimization of the object poses and implicit neural representation. During optimization, a novel pose outlier voting strategy with inter-view consistency is proposed to correct large pose errors. Experiments on three datasets demonstrate that our method significantly outperforms the state-of-the-art in reconstruction quality for both single- and two-hand scenarios. Our project page: https://east-j.github.io/dynhor/

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — rgbd reconstruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shijian Jiang , Qi Ye , Rengan Xie , Yuchi Huo , Jiming Chen

Topics

Computer Vision > Analysis > 3D Vision Deep Learning > Learning Types > Self-Supervised Learning Computer Vision > Processing > 3D Vision

Keywords

3d reconstruction pose estimation neural representation implicit neural representation novel view synthesis object pose estimation object reconstruction rgbd reconstruction

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025