Convex Two-Layer Modeling

Özlem Aslan; Hao Cheng; Xinhua Zhang; Dale Schuurmans

2013 NIPS NeurIPS 2013

Convex Two-Layer Modeling

Abstract

Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization---creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — two-layer model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🐣 Hot Topic Early Bird — latent variable model

Authors

Özlem Aslan , Hao Cheng , Xinhua Zhang , Dale Schuurmans

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Machine Learning > Core Methods > Optimization Deep Learning > Optimization & Theory > Optimization Deep Learning > Learning Types > Representation Learning

Keywords

convex optimization hidden layer global training two-layer model hidden-layer adaptive latent layer latent variable model latent variable two-layer network conditional model

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013