Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms

Robert E. Mahony; Robert C. Williamson

2001 JMLR JMLR 2001

Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms

Abstract

A family of gradient descent algorithms for learning linear functions in an online setting is considered. The family includes the classical LMS algorithm as well as new variants such as the Exponentiated Gradient (EG) algorithm due to Kivinen and Warmuth. The algorithms are based on prior distributions defined on the weight space. Techniques from differential geometry are used to develop the algorithms as gradient descent iterations with respect to the natural gradient in the Riemannian structure induced by the prior distribution. The proposed framework subsumes the notion of "link-functions". [abs] [pdf] [ps.gz] [ps]

🌱 Topic Pioneer — Neural Network Optimization

📈 Trend Setter — Neural Network Optimization

🧭 Keyword Pioneer — online learning

🐣 Hot Topic Early Bird — online learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Robert E. Mahony , Robert C. Williamson

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Learning Types > Online Learning Mathematics & Optimization > Optimization > Optimization

Keywords

online learning natural gradient gradient descent prior distribution riemannian geometry differential geometry

Download PDF

Related papers

On the Size of Convex Hulls of Small Sets 2001

A New Approximate Maximal Margin Classification Algorithm 2001

Lagrangian Support Vector Machines 2001

Sparse Bayesian Learning and the Relevance Vector Machine 2001

Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space 2001