RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

Albert Zeyer; Tamer Alkhouli; Hermann Ney

2018 ACL ACL 2018

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

Abstract

AbstractWe compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — layer-wise pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Albert Zeyer , Tamer Alkhouli , Hermann Ney

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Machine Translation Speech & Audio > Recognition > Speech Recognition Natural Language Processing > Generation > Machine Translation Deep Learning > Models > Transformers Deep Learning > Learning Types > Deep Learning

Keywords

machine translation speech recognition neural machine translation recurrent neural network beam search attention model layer-wise pretraining beam search decoder

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018