Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Aniket Das; Dheeraj Nagaraj; Soumyabrata Pal; Arun Sai Suggala; Prateek Varshney

2024 NIPS NeurIPS 2024

Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Abstract

$\newcommand{\Tr}{\mathsf{Tr}}$We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. More precisely, with $T$ samples, we show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $\sqrt{\frac{\Tr(\Sigma)+\sqrt{\Tr(\Sigma)\\|\Sigma\\|_2}\ln(\tfrac{\ln(T)}{\delta})}{T}}$ with probability $1-\delta$, where $\Sigma$ is the covariance of the clipped gradient. Note that the fluctuations (depending on $\tfrac{1}{\delta}$) are of lower order than the term $\Tr(\Sigma)$.This improves upon the current best rate of$\sqrt{\frac{\Tr(\Sigma)\ln(\tfrac{1}{\delta})}{T}}$ for Clipped-SGD, known \emph{only} for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of \citet{catoni2018dimension}.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🧭 Keyword Pioneer — heavy-tailed statistical estimation

Authors

Aniket Das , Dheeraj Nagaraj , Soumyabrata Pal , Arun Sai Suggala , Prateek Varshney

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Supervised Learning Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

stochastic gradient descent convex optimization statistical estimation stochastic convex optimization streaming algorithm heavy-tailed distribution gradient clipping martingale concentration heavy-tailed statistical estimation sub-gaussian rate heavy-tailed statistics

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024