SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygun; Oisin Mac Aodha

2024 CVPR CVPR 2024

SAOR: Single-View Articulated Object Reconstruction

Abstract

We introduce SAOR a novel approach for estimating the 3D shape texture and viewpoint of an articulated object from a single image captured in the wild. Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors. To prevent ill-posed solutions we propose a cross-instance consistency loss that exploits disentangled object shape deformation and articulation. This is helped by a new silhouette-based sampling mechanism to enhance viewpoint diversity during training. Our method only requires estimated object silhouettes and relative depth maps from off-the-shelf pre-trained networks during training. At inference time given a single-view image it efficiently outputs an explicit mesh representation. We obtain improved qualitative and quantitative results on challenging quadruped animals compared to relevant existing work.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — cross-instance consistency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mehmet Aygun , Oisin Mac Aodha

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > 3D Vision Deep Learning > Learning Types > Self-Supervised Learning Computer Vision > Processing > Image Processing Computer Vision > Analysis > Object Segmentation Deep Learning > Learning Types > Deep Learning Computer Vision > Core AI > Computer Vision Computer Vision > Generation > 3D Generation

Keywords

3d reconstruction object detection depth estimation mesh generation mesh representation articulated object shape deformation single-view reconstruction cross-instance consistency silhouette estimation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024