Privacy-preserving Sparse Generalized Eigenvalue Problem

Lijie Hu; Zihang Xiang; Jiabin Liu; Di Wang

2023 AISTATS AISTATS 2023

Privacy-preserving Sparse Generalized Eigenvalue Problem

Abstract

In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher’s discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a $\rho$-Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an $\ell_2$-norm estimation error of $\tilde{O}(\frac{d}{n}+\frac{d}{n^2\rho})$ (under some mild assumptions), where $d$ is the dimension and $n$ is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where $d\gg n$, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of $\tilde{O}(\frac{s\log d}{n}+\frac{s\log d}{n^2\rho})$ for various statistical models, where $s$ is the sparsity of the underlying parameter.Moreover, we show that these errors in the stochastic setting are optimal up to a factor of $\mathrm{Poly}(\log n)$ by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between $\epsilon$-DP and $\rho$-CDP for GEP, we also provide the lower bound $\Omega(\frac{d}{n}+\frac{d^2}{n^2\epsilon^2})$ and $\Omega(\frac{s\log d}{n}+\frac{s^2\log^2 d}{n^2\epsilon^2})$ of private minimax risk for PCA, under the statistical setting and $\epsilon$-DP model, in low and high dimensional sparse case respectively.

🧭 Keyword Pioneer — high dimensional sparse case

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Security & Privacy

Authors

Lijie Hu , Zihang Xiang , Jiabin Liu , Di Wang

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Application Areas > Privacy Machine Learning > Core Methods > Dimensionality Reduction Security & Privacy > Differential Privacy

Keywords

differential privacy principal component analysis canonical correlation analysis sparse pca minimax risk generalized eigenvalue problem high dimensional sparse case

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023