Lifted Proximal Operator Machines

Jia Li; Cong Fang; Zhouchen Lin

2019 AAAI AAAI 2019

Lifted Proximal Operator Machines

Abstract

Abstract We propose a new optimization method for training feedforward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feedforward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multiconvex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations. Most notably, we only use the mapping of the activation function itself, rather than its derivative, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. Its parameter tuning is also much simpler. We further prove the convergence of updating the layer-wise weights and activations and point out that the optimization could be made parallel by asynchronous update. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jia Li , Cong Fang , Zhouchen Lin

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks Mathematics & Optimization > Optimization > Continuous Optimization Machine Learning > Core Methods > Optimization Deep Learning > Optimization & Theory > Optimization

Keywords

non-convex optimization block coordinate descent proximal operator activation function gradient vanishing feedforward network feedforward neural network neural network

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019