Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information

Beiming Cao; Myungjong Kim; Jun R. Wang; Jan van Santen; Ted Mau; Jun Wang

2018 INTERSPEECH INTERSPEECH 2018

Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information

Abstract

Articulation-to-speech (ATS) synthesis generates audio waveform directly from articulatory information. Current works in ATS used articulatory movement information (spatial coordinates) only. The orientation information of articulatory flesh points has rarely been used, although some devices (e.g., electromagnetic articulography) provide that. Previous work indicated that orientation information contains significant information for speech production. In this paper, we explored the performance of applying orientation information of flesh points on articulators (i.e., tongue, lips and jaw) in ATS. Experiments using articulators' movement information with or without orientation information were conducted using standard deep neural networks (DNNs) and long-short term memory-recurrent neural networks (LSTM-RNNs). Both objective and subjective evaluations indicated that adding orientation information of flesh points on articulators in addition to movement information generated higher quality speech output than using movement information only.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

📈 Trend Setter — Multi-Task Learning

🧭 Keyword Pioneer — articulation-to-speech synthesis

Authors

Beiming Cao , Myungjong Kim , Jun R. Wang , Jan van Santen , Ted Mau , Jun Wang

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Synthesis Speech & Audio > Synthesis > Speech Enhancement Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Core Methods > Multi-Task Learning Artificial Intelligence > Learning Paradigms > Multi-Task Learning

Keywords

speech synthesis long short-term memory recurrent neural network articulatory movement silent speech interface speech production articulation-to-speech synthesis articulatory flesh point orientation information articulatory-to-speech synthesis flesh point sensor

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018