End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition

Sheng Li; Chenchen Ding; Xugang Lu; Peng Shen; Tatsuya Kawahara; Hisashi Kawai

2019 INTERSPEECH INTERSPEECH 2019

End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition

Abstract

The end-to-end (E2E) model allows for training of automatic speech recognition (ASR) systems without the hand-designed language-specific pronunciation lexicons. However, constructing the multilingual low-resource E2E ASR system is still challenging due to the vast number of symbols (e.g., words and characters). In this paper, we investigate an efficient method of encoding multilingual transcriptions for training E2E ASR systems. We directly encode the symbols of multilingual writing systems to universal articulatory representations, which is much more compact than characters and words. Compared with traditional multilingual modeling methods, we directly build a single acoustic-articulatory within recent transformer-based E2E framework for ASR tasks. The speech recognition results of our proposed method significantly outperform the conventional word-based and character-based E2E models.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — transformer-based model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Sheng Li , Chenchen Ding , Xugang Lu , Peng Shen , Tatsuya Kawahara , Hisashi Kawai

Topics

Deep Learning > Architectures > Transformers Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition

Keywords

transformer-based model multilingual speech recognition end-to-end asr transformer-based asr articulatory attribute low-resource asr universal articulatory representation end-to-end articulatory modeling low-resource asr system

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019