G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye; Abhinav Gupta; Kris Kitani; Shubham Tulsiani

2024 CVPR CVPR 2024

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Abstract

We propose G-HOP a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as a generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model trained by aggregating several diverse real-world interaction datasets spanning 155 categories represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis outperforming current task-specific baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — skeletal distance field

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yufei Ye , Abhinav Gupta , Kris Kitani , Shubham Tulsiani

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Few-Shot Learning Computer Vision > Analysis > 3D Vision

Keywords

3d reconstruction signed distance field denoising diffusion hand-object interaction grasp synthesis skeletal distance field

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024