Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Zhe Gan; Chunyuan Li; Changyou Chen; Yunchen Pu; Qinliang Su; Lawrence Carin

2017 ACL ACL 2017

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Abstract

AbstractRecurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.

🌱 Topic Pioneer — Language Modeling

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhe Gan , Chunyuan Li , Changyou Chen , Yunchen Pu , Qinliang Su , Lawrence Carin

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Stochastic Processes Natural Language Processing > Generation > Language Modeling Machine Learning > Bayesian & Probabilistic > Bayesian Learning Deep Learning > Architectures > Recurrent Neural Networks Deep Learning > Learning Types > Language Modeling

Keywords

stochastic gradient descent bayesian learning language modeling markov chain monte carlo model averaging recurrent neural network weight uncertainty stochastic gradient markov chain monte carlo

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017