On the distance between two neural networks and the stability of learning

Jeremy Bernstein; Arash Vahdat; Yisong Yue; Ming-Yu Liu

2020 NIPS NeurIPS 2020

On the distance between two neural networks and the stability of learning

Abstract

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage.

🧭 Keyword Pioneer — parameter distance

🐝 Cross-Pollinator — Deep Learning, Machine Learning, Natural Language Processing

Authors

Jeremy Bernstein , Arash Vahdat , Yisong Yue , Ming-Yu Liu

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization

Keywords

learning rate tuning parameter distance gradient breakdown deep relative trust

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020