Deep Learning Made Easier by Linear Transformations in Perceptrons

Tapani Raiko; Harri Valpola; Yann LeCun

2012 AISTATS AISTATS 2012

Deep Learning Made Easier by Linear Transformations in Perceptrons

Abstract

We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of small images and learning a low-dimensional representation for images by using a deep unsupervised auto-encoder network. The transformations were beneficial in all cases, with and without regularization and with networks from two to five hidden layers.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Neural Network Optimization

🐣 Hot Topic Early Bird — stochastic gradient

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tapani Raiko , Harri Valpola , Yann LeCun

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Normalization Deep Learning > Optimization & Theory > Neural Network Optimization

Keywords

stochastic gradient deep learning natural gradient gradient descent fisher information matrix linear transformation

Download PDF

Related papers

Minimax rates for homology inference 2012

Scalable Personalization of Long-Term Physiological Monitoring: Active Learning Methodologies for Epileptic Seizure Onset Detection 2012

Adaptive Metropolis with Online Relabeling 2012

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing 2012

Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters 2012