2020 WACV WACV 2020

Active Learning for Imbalanced Datasets

Abstract

Active learning increases the effectiveness of labeling when only subsets of unlabeled datasets can be processed manually. To our knowledge, existing algorithms are designed under the assumption that datasets are balanced. However, many real-life datasets are actually imbalanced and we propose two adaptations of active learning to tackle imbalance. First, we modify acquisition functions to select samples by taking advantage of a deep model pretrained on a source domain. Second, we introduce a balancing step in the acquisition process to reduce the imbalance of the labeled subset. Evaluation is done with four imbalanced datasets using existing active learning methods and their modifications introduced here. Results show that our adaptations are useful as long as knowledge from the source domain is transferable to target domains.

🚀 Conference Pioneer — WACV 2020
🧭 Keyword Pioneer — labeled subset
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio