Amodal Ground Truth and Completion in the Wild

Guanqi Zhan; Chuanxia Zheng; Weidi Xie; Andrew Zisserman

2024 CVPR CVPR 2024

Amodal Ground Truth and Completion in the Wild

Abstract

This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark MP3D-Amodal consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild we explore two architecture variants: a two-stage model that first infers the occluder followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects including COCOA and our new MP3D-Amodal dataset. The dataset model and code are available at https://www. robots.ox.ac.uk/ vgg/research/amodal/.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guanqi Zhan , Chuanxia Zheng , Weidi Xie , Andrew Zisserman

Topics

Computer Vision > Analysis > Object Detection Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Image Segmentation

Keywords

image segmentation object segmentation stable diffusion amodal segmentation occluded object mask completion

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024