How Does Knowledge of the AUC Constrain the Set of Possible Ground-Truth Labelings?

Jacob Whitehill

2019 AAAI AAAI 2019

How Does Knowledge of the AUC Constrain the Set of Possible Ground-Truth Labelings?

Abstract

Abstract Recent work on privacy-preserving machine learning has considered how datamining competitions such as Kaggle could potentially be “hacked”, either intentionally or inadvertently, by using information from an oracle that reports a classifier’s accuracy on the test set (Blum and Hardt 2015; Hardt and Ullman 2014; Zheng 2015; Whitehill 2016). For binary classification tasks in particular, one of the most common accuracy metrics is the Area Under the ROC Curve (AUC), and in this paper we explore the mathematical structure of how the AUC is computed from an n-vector of real-valued “guesses” with respect to the ground-truth labels. Under the assumption of perfect knowledge of the test set AUC c=p/q, we show how knowing c constrains the set W of possible ground-truth labelings, and we derive an algorithm both to compute the exact number of such labelings and to enumerate efficiently over them. We also provide empirical evidence that, surprisingly, the number of compatible labelings can actually decrease as n grows, until a test set-dependent threshold is reached. Finally, we show how W can be efficiently whittled down, through pairs of oracle queries, to infer all the groundtruth test labels with complete certainty.

❓ The Questioner

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Machine Learning and Security & Privacy

🧭 Keyword Pioneer — ground truth inference

🐣 Hot Topic Early Bird — privacy-preserving machine learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Jacob Whitehill

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Security & Privacy > Privacy Machine Learning > Optimization & Theory > Statistics

Keywords

privacy-preserving machine learning binary classification area under the roc curve oracle query ground truth inference area under roc curve ground-truth labeling

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019