Pyramidal Recurrent Unit for Language Modeling

Sachin Mehta; Rik Koncel-Kedziorski; Mohammad Rastegari; Hannaneh Hajishirzi

2018 EMNLP EMNLP 2018

Pyramidal Recurrent Unit for Language Modeling

Abstract

AbstractLSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace the linear transformation in LSTMs with more sophisticated interactions such as pyramidal or grouped linear transformations. This architecture gives strong results on word-level language modeling while reducing parameters significantly. In particular, PRU improves the perplexity of a recent state-of-the-art language model by up to 1.3 points while learning 15-20% fewer parameters. For similar number of model parameters, PRU outperforms all previous RNN models that exploit different gating mechanisms and transformations. We provide a detailed examination of the PRU and its behavior on the language modeling tasks. Our code is open-source and available at https://sacmehta.github.io/PRU/.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — pyramidal recurrent unit

🐣 Hot Topic Early Bird — parameter efficiency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sachin Mehta , Rik Koncel-Kedziorski , Mohammad Rastegari , Hannaneh Hajishirzi

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Natural Language Processing > Generation > Language Modeling Deep Learning > Architectures > Recurrent Neural Networks Deep Learning > Models > Language Models

Keywords

neural network architecture language modeling long short-term memory parameter efficiency parameter reduction pyramidal recurrent unit word-level language modeling

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018