2020 WACV WACV 2020

DATNet: Dense Auxiliary Tasks for Object Detection

Abstract

Beginning with R-CNN, there has been a rapid advancement in two-stage object detection approaches. While two-stage approaches remain the state-of-the-art in object detection, anchor-free single-stage methods have been gaining momentum. We believe that the strength of the former is in their region of interest (ROI) pooling stage, while the latter simplifies the learning problem by converting object detection into dense per-pixel prediction tasks. In this paper, we propose to combine the strengths of each approach in a new architecture. In particular, we first define several auxiliary tasks related to object detection and generate dense per-pixel predictions using a shared feature extraction backbone. As a consequence of this architecture, the shared backbone is trained using both the standard object detection losses and these per-pixel ones. Moreover, by combining the features from dense predictions with those from the backbone, we realize a more discriminative representation for subsequent downstream processing. In addition, we feed the fused features into a novel multi-scale ROI pooling layer, followed by per-ROI predictions. We refer to our architecture as the Dense Auxiliary Tasks Network (DATNet). We present an extensive set of evaluations of our method on the Pascal VOC and COCO datasets and show considerable accuracy improvements over comparable baselines.

🚀 Conference Pioneer — WACV 2020
🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning
🧭 Keyword Pioneer — region of interest pooling
🐣 Hot Topic Early Bird — dense prediction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio