Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes

Artyom Sorokin; Nazar Buzun; Leonid Pugachev; Mikhail Burtsev

2022 NIPS NeurIPS 2022

Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes

Abstract

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose \texttt{MemUP}, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with \texttt{MemUP} performs better or comparable to baselines while requiring to store less intermediate data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Artyom Sorokin , Nazar Buzun , Leonid Pugachev , Mikhail Burtsev

Topics

Artificial Intelligence > Core AI > Memory Mathematics & Optimization > Optimization > Stochastic Methods Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Learning Types > Representation Learning

Keywords

sequence modeling sequential learning recurrent neural network long-term memory gradient-based training uncertainty prediction

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022