Dropout Training as Adaptive Regularization

Stefan Wager; Sida Wang; Percy Liang

2013 NIPS NeurIPS 2013

Dropout Training as Adaptive Regularization

Abstract

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an $\LII$ regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learner, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Model Architecture

🧭 Keyword Pioneer — dropout regularization

🐣 Hot Topic Early Bird — semi-supervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌱 Topic Pioneer — Regularization

Authors

Stefan Wager , Sida Wang , Percy Liang

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Optimization & Theory > Optimization Deep Learning > Techniques > Model Architecture Machine Learning > Learning Types > Deep Learning Machine Learning > Learning Types > Regularization Deep Learning > Learning Types > Semi-Supervised Learning Machine Learning > Optimization & Theory > Regularization Deep Learning > Techniques > Regularization

Keywords

semi-supervised learning fisher information document classification adaptive regularization dropout regularization feature scaling generalized linear model linear model

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013