End-to-end Speech Recognition Using Lattice-free MMI

Hossein Hadian; Hossein Sameti; Daniel Povey; Sanjeev Khudanpur

2018 INTERSPEECH INTERSPEECH 2018

End-to-end Speech Recognition Using Lattice-free MMI

Abstract

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees. We use full biphones to enable context-dependent modeling without trees and show that our end-to-end LF-MMI approach can achieve comparable results to regular LF-MMI on well-known large vocabulary tasks. We also compare with other end-to-end methods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent relative reduction in word error rates on different large vocabulary tasks while using significantly smaller models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — lattice-free maximum mutual information

🐣 Hot Topic Early Bird — end-to-end speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hossein Hadian , Hossein Sameti , Daniel Povey , Sanjeev Khudanpur

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Core Methods > Representation Learning

Keywords

acoustic modeling connectionist temporal classification hidden markov model end-to-end speech recognition lattice-free maximum mutual information

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018