On the Computational Efficiency of Training Neural Networks

Roi Livni; Shai Shalev-shwartz; Ohad Shamir

2014 NIPS NeurIPS 2014

On the Computational Efficiency of Training Neural Networks

Abstract

It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks are trained efficiently using SGD and a variety of tricks that include different activation functions (e.g. ReLU), over-specification (i.e., train networks which are larger than needed), and regularization. In this paper we revisit the computational complexity of training neural networks from a modern perspective. We provide both positive and negative results, some of them yield new provably efficient and practical algorithms for training neural networks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Theory

🧭 Keyword Pioneer — training algorithm

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Roi Livni , Shai Shalev-shwartz , Ohad Shamir

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Optimization & Theory > Theory

Keywords

stochastic gradient descent neural network training neural network optimization computational complexity activation function training algorithm

Download PDF

Related papers

Information-based learning by agents in unbounded state spaces 2014

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm 2014

Partition-wise Linear Models 2014

Active Regression by Stratification 2014

Cone-Constrained Principal Component Analysis 2014