Distributed optimization of deeply nested systems

Miguel Carreira-Perpinan; Weiran Wang

2014 AISTATS AISTATS 2014

Distributed optimization of deeply nested systems

Abstract

Intelligent processing of complex signals such as images is often performed by a hierarchy of nonlinear processing layers, such as a deep net or an object recognition cascade. Joint estimation of the parameters of all the layers is a difficult nonconvex optimization. We describe a general strategy to learn the parameters and, to some extent, the architecture of nested systems, which we call the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, can perform some model selection on the fly, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — distributed optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Miguel Carreira-Perpinan , Weiran Wang

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Continuous Optimization

Keywords

deep learning distributed optimization alternating optimization penalty method

Download PDF

Related papers

Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes 2014

PAC-Bayesian Theory for Transductive Learning 2014

Sparse Bayesian Variable Selection for the Identification of Antigenic Variability in the Foot-and-Mouth Disease Virus 2014

Analytic Long-Term Forecasting with Periodic Gaussian Processes 2014

Exploiting the Limits of Structure Learning via Inherent Symmetry 2014