Semi-Supervised Aggregation of Dependent Weak Supervision Sources With Performance Guarantees

Alessio Mazzetto; Dylan Sam; Andrew Park; Eli Upfal; Stephen Bach

2021 AISTATS AISTATS 2021

Semi-Supervised Aggregation of Dependent Weak Supervision Sources With Performance Guarantees

Abstract

We develop a novel method that provides theoretical guarantees for learning from weak labelers without the (mostly unrealistic) assumption that the errors of the weak labelers are independent or come from a particular family of distributions. We show a rigorous technique for efficiently selecting small subsets of the labelers so that a majority vote from such subsets has a provably low error rate. We explore several extensions of this method and provide experimental results over a range of labeled data set sizes on 45 image classification tasks. Our performance-guaranteed methods consistently match the best performing alternative, which varies based on problem difficulty. On tasks with accurate weak labelers, our methods are on average 3 percentage points more accurate than the state-of-the-art adversarial method. On tasks with inaccurate weak labelers, our methods are on average 15 percentage points more accurate than the semi-supervised Dawid-Skene model (which assumes independence).

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alessio Mazzetto , Dylan Sam , Andrew Park , Eli Upfal , Stephen Bach

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Learning Types > Classification

Keywords

image classification semi-supervised learning weak supervision majority vote label aggregation performance guarantee dawid-skene model adversarial method weak labeler

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021