Revisiting Weight Initialization of Deep Neural Networks

Maciej Skorski; Alessandro Temperoni; Martin Theobald

2021 ACML ACML 2021

Revisiting Weight Initialization of Deep Neural Networks

Abstract

The proper {\em initialization of weights} is crucial for the effective training and fast convergence of {\em deep neural networks} (DNNs). Prior work in this area has mostly focused on the principle of {\em balancing the variance among weights per layer} to maintain stability of (i) the input data propagated forwards through the network, and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only first-order effects per layer. In this paper, we investigate a {\em unifying approach}, based on approximating and controlling the {\em norm of the layers’ Hessians}, which both generalizes and explains existing initialization schemes such as {\em smooth activation functions}, {\em Dropouts}, and {\em ReLU}.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maciej Skorski , Alessandro Temperoni , Martin Theobald

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Neural Networks

Keywords

neural network optimization hessian approximation weight initialization deep neural network gradient propagation

Download PDF

Related papers

Transfer Learning with Adaptive Online TrAdaBoost for Data Streams 2021

$h$-DBSCAN: A simple fast DBSCAN algorithm for big data 2021

Iterative Deep Model Compression and Acceleration in the Frequency Domain 2021

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations 2021

Contrastive Neural Processes for Self-Supervised Learning 2021