Mean Estimation with User-level Privacy under Data Heterogeneity

Rachel Cummings; Vitaly Feldman; Audra McMillan; Kunal Talwar

2022 NIPS NeurIPS 2022

Mean Estimation with User-level Privacy under Data Heterogeneity

Abstract

A key challenge in many modern data analysis tasks is that user data is heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in our problem.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — population-level estimation

🐣 Hot Topic Early Bird — data heterogeneity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rachel Cummings , Vitaly Feldman , Audra McMillan , Kunal Talwar

Topics

Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Application Areas > Privacy Mathematics & Optimization > Mathematics > Statistics Machine Learning > Learning Paradigms > Federated Learning Machine Learning > Learning Types > Privacy Mathematics & Optimization > Statistics > Statistics

Keywords

differential privacy mean estimation asymptotic optimality data heterogeneity user-level privacy population-level estimation

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022