SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Ziming Zhang; Yun Yue; Guojun Wu; Yanhua Li; Haichong Zhang

2021 NIPS NeurIPS 2021

SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Abstract

In this paper we consider the training stability of recurrent neural networks (RNNs) and propose a family of RNNs, namely SBO-RNN, that can be formulated using stochastic bilevel optimization (SBO). With the help of stochastic gradient descent (SGD), we manage to convert the SBO problem into an RNN where the feedforward and backpropagation solve the lower and upper-level optimization for learning hidden states and their hyperparameters, respectively. We prove that under mild conditions there is no vanishing or exploding gradient in training SBO-RNN. Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — stochastic bilevel optimization

🐣 Hot Topic Early Bird — hidden state

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ziming Zhang , Yun Yue , Guojun Wu , Yanhua Li , Haichong Zhang

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Mathematics & Optimization > Optimization > Optimization Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Recurrent Neural Networks

Keywords

stochastic gradient descent hyperparameter learning gradient descent bilevel optimization hidden state recurrent neural network gradient vanishing stochastic bilevel optimization

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021