Reducing the Computational Complexity of Two-Dimensional LSTMs

Bo Li; Tara N. Sainath

2017 INTERSPEECH INTERSPEECH 2017

Reducing the Computational Complexity of Two-Dimensional LSTMs

Abstract

Long Short-Term Memory Recurrent Neural Networks (LSTMs) are good at modeling temporal variations in speech recognition tasks, and have become an integral component of many state-of-the-art ASR systems. More recently, LSTMs have been extended to model variations in the speech signal in two dimensions, namely time and frequency [1, 2]. However, one of the problems with two-dimensional LSTMs, such as Grid-LSTMs, is that the processing in both time and frequency occurs sequentially, thus increasing computational complexity. In this work, we look at minimizing the dependence of the Grid-LSTM with respect to previous time and frequency points in the sequence, thus reducing computational complexity. Specifically, we compare reducing computation using a bidirectional Grid-LSTM (biGrid-LSTM) with non-overlapping frequency sub-band processing, a PyraMiD-LSTM [3] and a frequency-block Grid-LSTM (fbGrid-LSTM) for parallel time-frequency processing. We find that the fbGrid-LSTM can reduce computation costs by a factor of four with no loss in accuracy, on a 12,500 hour Voice Search task.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — frequency processing

Authors

Bo Li , Tara N. Sainath

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speech Recognition Deep Learning > Optimization & Theory > Model Compression

Keywords

speech recognition computational complexity parallel processing long short-term memory recurrent neural network grid lstm bidirectional model frequency processing

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017