Dropout Regularization Versus l2-Penalization in the Linear Model

Gabriel Clara; Sophie Langer; Johannes Schmidt-Hieber

2024 JMLR JMLR 2024

Dropout Regularization Versus l2-Penalization in the Linear Model

Abstract

We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and $\ell_2$-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator. [abs] [ pdf ][ bib ] © JMLR 2024. (edit, beta)

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Gabriel Clara , Sophie Langer , Johannes Schmidt-Hieber

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Continuous Optimization Machine Learning > Learning Types > Regularization

Keywords

gradient descent linear regression l2 regularization dropout regularization least squares estimator

Download PDF

Related papers

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks 2024

Convergence for nonconvex ADMM, with applications to CT imaging 2024

Functional Directed Acyclic Graphs 2024

Sum-of-norms clustering does not separate nearby balls 2024

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning 2024