Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Maxim Kodryan; Artem Grachev; Dmitry Ignatov; Dmitry Vetrov

2019 ACL ACL 2019

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Abstract

AbstractReduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maxim Kodryan , Artem Grachev , Dmitry Ignatov , Dmitry Vetrov

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Machine Learning > Bayesian & Probabilistic > Variational Inference Deep Learning > Optimization & Theory > Model Compression Artificial Intelligence > Core AI > Language

Keywords

model compression variational inference automatic relevance determination recurrent neural network language model weight pruning

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019