Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Mark Eisen; Aryan Mokhtari; Alejandro Ribeiro

2018 AISTATS AISTATS 2018

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Abstract

Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $\log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — truncated eigenvalue decomposition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Mark Eisen , Aryan Mokhtari , Alejandro Ribeiro

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Continuous Optimization

Keywords

large-scale optimization empirical risk minimization hessian approximation second-order method adaptive sample size truncated eigenvalue decomposition

Download PDF

Related papers

The Geometry of Random Features 2018

A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians 2018

Regional Multi-Armed Bandits 2018

Group Invariance Principles for Causal Generative Models 2018

Stochastic Three-Composite Convex Minimization with a Linear Operator 2018