Vision Transformers Are Good Mask Auto-Labelers

Shiyi Lan; Xitong Yang; Zhiding Yu; Zuxuan Wu; Jose M. Alvarez; Anima Anandkumar

2023 CVPR CVPR 2023

Vision Transformers Are Good Mask Auto-Labelers

Abstract

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding mask quality. Instance segmentation models trained using the MAL-generated masks can nearly match the performance of their fully-supervised counterparts, retaining up to 97.4% performance of fully supervised models. The best model achieves 44.1% mAP on COCO instance segmentation (test-dev 2017), outperforming state-of-the-art box-supervised methods by significant margins. Qualitative results indicate that masks produced by MAL are, in some cases, even better than human annotations.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — mask auto-labeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shiyi Lan , Xitong Yang , Zhiding Yu , Zuxuan Wu , Jose M. Alvarez , Anima Anandkumar

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Architectures > Transformers Computer Vision > Processing > Image Segmentation Computer Vision > Core AI Computer Vision > Analysis > Object Segmentation Deep Learning > Learning Types > Weakly Supervised Learning

Keywords

semantic segmentation vision transformer object detection weakly supervised learning instance segmentation mask generation box supervision mask auto-labeling

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023