Fair and Diverse DPP-Based Data Summarization

Elisa Celis; Vijay Keswani; Damian Straszak; Amit Deshpande; Tarun Kathuria; Nisheeth Vishnoi

2018 ICML ICML 2018

Fair and Diverse DPP-Based Data Summarization

Abstract

Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias {–} e.g., under or over representation of a particular gender or ethnicity {–} in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Designing efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier; we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning

🐣 Hot Topic Early Bird — feature space

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Elisa Celis , Vijay Keswani , Damian Straszak , Amit Deshpande , Tarun Kathuria , Nisheeth Vishnoi

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Fairness Data Science & Analytics > Applications > Clustering Machine Learning > Core Methods > Feature Selection Machine Learning > Learning Types > Fairness

Keywords

feature space data summarization determinantal point process diversity sampling fairness constraint

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018