An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks

Qianxiao Li; Shuji Hao

2018 ICML ICML 2018

An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks

Abstract

Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — pontryagin maximum principle

🐣 Hot Topic Early Bird — neural network training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qianxiao Li , Shuji Hao

Topics

Machine Learning > Optimization & Theory > Optimization Deep Learning > Techniques > Model Architecture

Keywords

model compression neural network training optimal control pontryagin maximum principle discrete-weight neural network

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018