On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

Aditya Menon; Harikrishna Narasimhan; Shivani Agarwal; Sanjay Chawla

2013 ICML ICML 2013

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

Abstract

Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem, in machine learning as well as in data mining, artificial intelligence, and various applied fields. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Our results employ balanced losses that have been used recently in analyses of ranking problems (Kotlowski et al., 2011) and build on recent results on consistent surrogates for cost-sensitive losses (Scott, 2012). Experimental results confirm our consistency theorems.

🚀 Conference Pioneer — ICML 2013

📈 Trend Setter — Loss Functions

🧭 Keyword Pioneer — balanced loss

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🐣 Hot Topic Early Bird — binary classification

Authors

Aditya Menon , Harikrishna Narasimhan , Shivani Agarwal , Sanjay Chawla

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Learning Types > Classification Machine Learning > Learning Types > Fairness

Keywords

statistical consistency binary classification class imbalance surrogate loss cost-sensitive learning balanced loss performance measure

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Generic Exploration and K-armed Voting Bandits 2013

Robust Structural Metric Learning 2013