Unified Verbalization for Speech Recognition & Synthesis Across Languages

Sandy Ritchie; Richard Sproat; Kyle Gorman; Daan van Esch; Christian Schallhart; Nikos Bampounis; Benoît Brard; Jonas Fromseier Mortensen; Millie Holt; Eoin Mahon

2019 INTERSPEECH INTERSPEECH 2019

Unified Verbalization for Speech Recognition & Synthesis Across Languages

Abstract

We describe a new approach to converting written tokens to their spoken form, which can be shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems while exploiting linguistic commonalities to provide simple default verbalizations. We also describe improvements to an induction system for number names grammars. Between these shared ASR/TTS verbalizers and the improved induction system for number names grammars, we achieve significant gains in development time and scalability across languages.

🧭 Keyword Pioneer — verbalization grammar

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

🌉 Interdisciplinary Bridge — Interdisciplinary and Speech & Audio

Authors

Sandy Ritchie , Richard Sproat , Kyle Gorman , Daan van Esch , Christian Schallhart , Nikos Bampounis , Benoît Brard , Jonas Fromseier Mortensen , Millie Holt , Eoin Mahon

Topics

Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Synthesis > Text-to-Speech Interdisciplinary > Linguistics

Keywords

automatic speech recognition text-to-speech synthesis multilingual system verbalization grammar spoken form conversion language verbalization

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019