Weakly Supervised Open-Vocabulary Object Detection

Jianghang Lin; Yunhang Shen; Bingquan Wang; Shaohui Lin; Ke Li; Liujuan Cao

2024 AAAI AAAI 2024

Weakly Supervised Open-Vocabulary Object Detection

Abstract

Abstract Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within a single training dataset. In this paper, we propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD to detect novel concepts and utilize diverse datasets with only image-level annotations. To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment. First, we perform data-aware feature extraction to produce an input-conditional coefficient, which is leveraged into dataset attribute prototypes to identify dataset bias and help achieve cross-dataset generalization. Second, a customized location-oriented weakly supervised region proposal network is proposed to utilize high-level semantic layouts from the category-agnostic segment anything model to distinguish object boundaries. Lastly, we introduce a proposal-concept synchronized multiple-instance network, i.e., object mining and refinement with visual-semantic alignment, to discover objects matched to the text embeddings of concepts. Extensive experiments on Pascal VOC and MS COCO demonstrate that the proposed WSOVOD achieves new state-of-the-art compared with previous WSOD methods in both close-set object localization and detection tasks. Meanwhile, WSOVOD enables cross-dataset and open-vocabulary learning to achieve on-par or even better performance than well-established fully-supervised open-vocabulary object detection (FSOVOD).

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — vision-language alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jianghang Lin , Yunhang Shen , Bingquan Wang , Shaohui Lin , Ke Li , Liujuan Cao

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Application Areas > Domain Adaptation Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Weakly Supervised Learning

Keywords

domain adaptation object detection weakly supervised learning vision-language alignment multiple instance learning cross-dataset generalization weakly supervised object detection open-vocabulary learning open vocabulary detection vision language alignment

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024