2023 ICML ICML 2023

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Abstract

We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of preconditioning with a fixed damping term to combat overparameterization. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\mathsf{GD}$). Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true low-rank matrix at a constant linear rate that is independent of the condition number (apart from a short nearly dimension-free burdening period), with near-optimal sample complexity. This significantly improves upon the convergence rate of vanilla $\mathsf{GD}$ which suffers from a polynomial dependency with the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🧭 Keyword Pioneer — preconditioned gradient descent
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics