Fast Probabilistic Optimization from Noisy Gradients

Philipp Hennig

2013 ICML ICML 2013

Fast Probabilistic Optimization from Noisy Gradients

Abstract

Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it available, it would turn linearly efficient gradient descent into the quadratically efficient Newton-Raphson optimization. Existing covariant methods, though, are either super-linearly expensive or do not address noise. Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient. Importantly, the resulting algorithm, like stochastic gradient descent, has cost linear in the number of input dimensions.

🚀 Conference Pioneer — ICML 2013

🧭 Keyword Pioneer — noisy gradient

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Philipp Hennig

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Optimization Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

stochastic gradient descent hessian approximation bayesian optimization quasi-newton method noisy gradient

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Generic Exploration and K-armed Voting Bandits 2013

Robust Structural Metric Learning 2013