Fair k-Center Clustering for Data Summarization

Matthäus Kleindessner; Pranjal Awasthi; Jamie Morgenstern

2019 ICML ICML 2019

Fair k-Center Clustering for Data Summarization

Abstract

In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

🧭 Keyword Pioneer — k-center clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning

Authors

Matthäus Kleindessner , Pranjal Awasthi , Jamie Morgenstern

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Application Areas > Fairness

Keywords

data summarization approximation algorithm k-center clustering fairness constraint fair clustering

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019