Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Surbhi Goel; Sushrut Karmalkar; Adam Klivans

2019 NIPS NeurIPS 2019

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Abstract

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. \item There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\opt^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — sparse parity

🐣 Hot Topic Early Bird — gaussian distribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Surbhi Goel , Sushrut Karmalkar , Adam Klivans

Topics

Machine Learning > Core Methods > Regression Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Theory Machine Learning > Learning Types > Representation Learning Deep Learning > Learning Types > Deep Learning

Keywords

learning theory neural network optimization gradient descent square loss approximation algorithm gaussian distribution relu activation rectified linear unit computational hardness sparse parity relu learning time accuracy tradeoff

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019