CTC in the Context of Generalized Full-Sum HMM Training

Albert Zeyer; Eugen Beck; Ralf Schlüter; Hermann Ney

2017 INTERSPEECH INTERSPEECH 2017

CTC in the Context of Generalized Full-Sum HMM Training

Abstract

We formulate a generalized hybrid HMM-NN training procedure using the full-sum over the hidden state-sequence and identify CTC as a special case of it. We present an analysis of the alignment behavior of such a training procedure and explain the strong localization of label output behavior of full-sum training (also referred to as peaky or spiky behavior). We show how to avoid that behavior by using a state prior. We discuss the temporal decoupling between output label position/time-frame, and the corresponding evidence in the input observations when this is trained with BLSTM models. We also show a way how to overcome this by jointly training a FFNN. We implemented the Baum-Welch alignment algorithm in CUDA to be able to do fast soft realignments on GPU. We have published this code along with some of our experiments as part of RETURNN, RWTH’s extensible training framework for universal recurrent neural networks. We finish with experimental validation of our study on WSJ and Switchboard.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — baum-welch alignment

🐣 Hot Topic Early Bird — neural network optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Albert Zeyer , Eugen Beck , Ralf Schlüter , Hermann Ney

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Deep Learning > Architectures > Neural Networks Deep Learning > Techniques > Model Architecture Speech & Audio > Recognition > Speech Recognition Deep Learning > Learning Types > Deep Learning

Keywords

neural network optimization connectionist temporal classification hidden markov model bidirectional lstm baum-welch algorithm sequence training baum-welch alignment alignment algorithm neural network

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017