2019 INTERSPEECH INTERSPEECH 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition

Abstract

The end-to-end (E2E) model allows for training of automatic speech recognition (ASR) systems without the hand-designed language-specific pronunciation lexicons. However, constructing the multilingual low-resource E2E ASR system is still challenging due to the vast number of symbols (e.g., words and characters). In this paper, we investigate an efficient method of encoding multilingual transcriptions for training E2E ASR systems. We directly encode the symbols of multilingual writing systems to universal articulatory representations, which is much more compact than characters and words. Compared with traditional multilingual modeling methods, we directly build a single acoustic-articulatory within recent transformer-based E2E framework for ASR tasks. The speech recognition results of our proposed method significantly outperform the conventional word-based and character-based E2E models.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio
🧭 Keyword Pioneer — transformer-based model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio