Revisiting the Hierarchical Multiscale LSTM

Ákos Kádár; Marc-Alexandre Côté; Grzegorz Chrupała; Afra Alishahi

2018 COLING COLING 2018

Revisiting the Hierarchical Multiscale LSTM

Abstract

AbstractHierarchical Multiscale LSTM (Chung et. al., 2016) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.

🧭 Keyword Pioneer — ablation study

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ákos Kádár , Marc-Alexandre Côté , Grzegorz Chrupała , Afra Alishahi

Topics

Natural Language Processing > Generation > Language Modeling

Keywords

language modeling hierarchical model ablation study neural network character-level input

Download PDF

Related papers

DialEdit: Annotations for Spoken Conversational Image Editing 2018

Downward Compatible Revision of Dialogue Annotation 2018

Zero Pronoun Resolution with Attention-based Neural Network 2018

Triad-based Neural Network for Coreference Resolution 2018

Challenges of language technologies for the indigenous languages of the Americas 2018