Do Deep Nets Really Need to be Deep?

Jimmy Ba; Rich Caruana

2014 NIPS NeurIPS 2014

Do Deep Nets Really Need to be Deep?

Abstract

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.

❓ The Questioner

🌱 Topic Pioneer — Knowledge Distillation

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Model Compression

🧭 Keyword Pioneer — shallow network

🐣 Hot Topic Early Bird — model compression

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jimmy Ba , Rich Caruana

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Learning Theory Deep Learning > Architectures > Neural Networks Machine Learning > Application Areas > Model Compression Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression representation learning knowledge distillation model architecture deep neural network neural network shallow network

Download PDF

Related papers

Information-based learning by agents in unbounded state spaces 2014

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm 2014

Partition-wise Linear Models 2014

Active Regression by Stratification 2014

Cone-Constrained Principal Component Analysis 2014