Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

Antônio H. Ribeiro; Koen Tiels; Luis A. Aguirre; Thomas Schön

2020 AISTATS AISTATS 2020

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

Abstract

The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms of the cost function smoothness, which gives insight into higher-order derivatives and the existence of regions with many close local minima. We also clarify the distinction between vanishing gradients and the need for the RNN to learn attractors to fully use its expressive power. Through the lens of these refinements, we shed new light on recent developments in the RNN field, namely stable RNN and unitary (or orthogonal) RNNs.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — theoretical analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Antônio H. Ribeiro , Koen Tiels , Luis A. Aguirre , Thomas Schön

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Neural Networks

Keywords

theoretical analysis attractor dynamics gradient descent optimization recurrent neural network vanishing gradient exploding gradient

Download PDF

Related papers

Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons 2020

Fast and Accurate Ranking Regression 2020

Nonparametric Sequential Prediction While Deep Learning the Kernel 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation 2020

Unconditional Coresets for Regularized Loss Minimization 2020