Learning Polynomials with Neural Networks

Alexandr Andoni; Rina Panigrahy; Gregory Valiant; Li Zhang

2014 ICML ICML 2014

Learning Polynomials with Neural Networks

Abstract

We study the effectiveness of learning low degree polynomials using neural networks by the gradient descent method. While neural networks have been shown to have great expressive power, and gradient descent has been widely used in practice for learning neural networks, few theoretical guarantees are known for such methods. In particular, it is well known that gradient descent can get stuck at local minima, even for simple classes of target functions. In this paper, we present several positive theoretical results to support the effectiveness of neural networks. We focus on two-layer neural networks (i.e. one hidden layer) where the top layer node is a linear function, similar to \citebarron93. First we show that for a randomly initialized neural network with sufficiently many hidden units, the gradient descent method can learn any low degree polynomial. Secondly, we show that if we use complex-valued weights (the target function can still be real), then under suitable conditions, there are no “robust local minima”: the neural network can always escape a local minimum by performing a random perturbation. This property does not hold for real-valued weights. Thirdly, we discuss whether sparse polynomials can be learned with \emphsmall neural networks, where the size is dependent on the sparsity of the target function.

🧭 Keyword Pioneer — polynomial learning

🐣 Hot Topic Early Bird — gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexandr Andoni , Rina Panigrahy , Gregory Valiant , Li Zhang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Optimization & Theory > Neural Network Optimization

Keywords

gradient descent two-layer network theoretical guarantee neural network polynomial learning

Download PDF

Related papers

Demystifying Information-Theoretic Clustering 2014

Margins, Kernels and Non-linear Smoothed Perceptrons 2014

Large-Margin Metric Learning for Constrained Partitioning Problems 2014

Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function 2014

Generalized Exponential Concentration Inequality for Renyi Divergence Estimation 2014