Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Ziming Zhang; Matthew Brand

2017 NIPS NeurIPS 2017

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Abstract

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — neural network training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ziming Zhang , Matthew Brand

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Optimization

Keywords

neural network training convex optimization block coordinate descent deep neural network relu activation proximal point method tikhonov regularization

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017