Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

Kyle Helfrich; Qiang Ye

2020 AAAI AAAI 2020

Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

Abstract

Abstract Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — eigenvalue normalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kyle Helfrich , Qiang Ye

Topics

Artificial Intelligence > Core AI > Memory Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Neural Networks Mathematics & Optimization > Optimization > Continuous Optimization Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Learning Types > Deep Learning Deep Learning > Learning Types > Representation Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

sequence modeling gradient descent short-term memory recurrent neural network vanishing gradient orthogonal matrix unitary matrix eigenvalue normalization

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020