Scalable Fair Clustering

Arturs Backurs; Piotr Indyk; Krzysztof Onak; Baruch Schieber; Ali Vakilian; Tal Wagner

2019 ICML ICML 2019

Scalable Fair Clustering

Abstract

We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard $k$-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — fairlet decomposition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

Authors

Arturs Backurs , Piotr Indyk , Krzysztof Onak , Baruch Schieber , Ali Vakilian , Tal Wagner

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Application Areas > Fairness Mathematics & Optimization > Optimization > Combinatorial Optimization

Keywords

k-median clustering fair clustering fairlet decomposition approximate algorithm

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019