DAP: Detection-Aware Pre-Training With Weak Supervision

Yuanyi Zhong; Jianfeng Wang; Lijuan Wang; Jian Peng; Yu-Xiong Wang; Lei Zhang

2021 CVPR CVPR 2021

DAP: Detection-Aware Pre-Training With Weak Supervision

Abstract

This paper presents a detection-aware pre-training (DAP) approach, which leverages only weakly-labeled classification-style datasets (e.g., ImageNet) for pre-training, but is specifically tailored to benefit object detection tasks. In contrast to the widely used image classification-based pre-training (e.g., on ImageNet), which does not include any location-related training tasks, we transform a classification dataset into a detection dataset through a weakly supervised object localization method based on Class Activation Maps to directly pre-train a detector, making the pre-trained model location-aware and capable of predicting bounding boxes. We show that DAP can outperform the traditional classification pre-training in terms of both sample efficiency and convergence speed in downstream detection tasks including VOC and COCO. In particular, DAP boosts the detection accuracy by a large margin when the number of examples in the downstream task is small.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuanyi Zhong , Jianfeng Wang , Lijuan Wang , Jian Peng , Yu-Xiong Wang , Lei Zhang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Techniques > Pretraining Computer Vision > Analysis > Object Detection Deep Learning > Learning Types > Transfer Learning

Keywords

image classification transfer learning object detection weak supervision bounding box bounding box prediction class activation map

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021