A faster and simpler algorithm for learning shallow networks

Sitan Chen; Shyam Narayanan

2024 COLT COLT 2024

A faster and simpler algorithm for learning shallow networks

Abstract

We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sitan Chen , Shyam Narayanan

Topics

Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Neural Networks Mathematics & Optimization > Optimization > Continuous Optimization

Keywords

polynomial time gaussian distribution relu activation runtime analysis shallow network

Download PDF

Related papers

Exact Mean Square Linear Stability Analysis for SGD 2024

Optimistic Information Directed Sampling 2024

Robust Distribution Learning with Local and Global Adversarial Corruptions (extended abstract) 2024

Depth Separation in Norm-Bounded Infinite-Width Neural Networks 2024

The Sample Complexity of Simple Binary Hypothesis Testing 2024