2014 AISTATS AISTATS 2014

Distributed optimization of deeply nested systems

Abstract

Intelligent processing of complex signals such as images is often performed by a hierarchy of nonlinear processing layers, such as a deep net or an object recognition cascade. Joint estimation of the parameters of all the layers is a difficult nonconvex optimization. We describe a general strategy to learn the parameters and, to some extent, the architecture of nested systems, which we call the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, can perform some model selection on the fly, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🐣 Hot Topic Early Bird — distributed optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio