Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition

Mittul Singh; Youssef Oualil; Dietrich Klakow

2017 INTERSPEECH INTERSPEECH 2017

Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition

Abstract

Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, as a second step in the decoding process. Ideally, these adapted RNNLMs should be applied for first-pass decoding. Thus, we introduce two ways of applying adapted long-short-term-memory (LSTM) based RNNLMs for first-pass decoding. Using available techniques to convert LSTMs to approximated versions for first-pass decoding, we compare approximated LSTMs adapted in a Fast Marginal Adaptation framework (FMA) and an approximated version of architecture-based-adaptation of LSTM. On a conversational speech recognition task, these differently approximated and adapted LSTMs combined with a trigram LM outperform other adapted and unadapted LMs. Here, the architecture-adapted LSTM combination obtains a 35.9% word error rate (WER) and is outperformed by FMA-based LSTM combination obtaining the overall lowest WER of 34.4%.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — first-pass decoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mittul Singh , Youssef Oualil , Dietrich Klakow

Topics

Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

domain adaptation automatic speech recognition long short-term memory language model first-pass decoding

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017