Practical Variational Inference for Neural Networks

Alex Graves

2011 NIPS NeurIPS 2011

Practical Variational Inference for Neural Networks

Abstract

Variational methods have been previously explored as a tractable approximation to Bayesian inference for neural networks. However the approaches proposed so far have only been applicable to a few simple network architectures. This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks. Along the way it revisits several common regularisers from a variational perspective. It also provides a simple pruning heuristic that can both drastically reduce the number of network weights and lead to improved generalisation. Experimental results are provided for a hierarchical multidimensional recurrent neural network applied to the TIMIT speech corpus.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

📈 Trend Setter — Model Merging

🧭 Keyword Pioneer — network pruning

🐣 Hot Topic Early Bird — variational inference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Alex Graves

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Application Areas > Model Merging Deep Learning > Models > Variational Inference Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Bayesian & Probabilistic > Variational Inference Deep Learning > Techniques > Representation Learning

Keywords

variational inference bayesian inference speech recognition network pruning model pruning minimum description length stochastic variational method neural network

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011