2018 INTERSPEECH INTERSPEECH 2018

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition

Abstract

It has been known for a long time that the classic Hidden-Markov-Model (HMM) derivation for speech recognition contains assumptions such as independence of observation vectors and weak duration modeling that are practical but unrealistic. When using the hybrid approach this is amplified by trying to fit a discriminative model into a generative one. Hidden Conditional Random Fields (CRFs) and segmental models (e.g. Semi-Markov CRFs / Segmental CRFs) have been proposed as an alternative, but for a long time have failed to get traction until recently. In this paper we explore different length modeling approaches for segmental models, their relation to attention-based systems. Furthermore we show experimental results on a handwriting recognition task and to the best of our knowledge the first reported results on the Switchboard 300h speech recognition corpus using this approach.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio