2019 INTERSPEECH INTERSPEECH 2019

Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis

Abstract

We present a geometric vocal fold model that describes the glottal area between the lower and upper vocal fold edges as a function of time. It is based on a glottis model by Titze [J. Acoust. Soc. Am., 75(2), 570–580 (1984)] and has been enhanced to allow the generation of skewed (asymmetric) glottal area waveforms and diplophonic double pulsing. Embedded in the articulatory speech synthesizer VocalTractLab, the model was used for the synthesis of German words with a range of settings for the vocal fold model parameters to generate different male and female voices. A perception experiment was conducted to determine the parameter settings that generate the most natural-sounding voices. The most natural-sounding male voice was generated with a slightly divergent prephonatory glottal shape, with a phase delay of 70° between the lower and upper vocal fold edges, symmetric glottal area pulses, and a little shimmer (double pulsing). The most natural-sounding female voice was generated with a straight prephonatory glottal channel, with a phase delay of 50° between the vocal fold edges, slightly asymmetric glottal area pulses, and a little shimmer.

🧭 Keyword Pioneer — vocal fold model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Interdisciplinary and Speech & Audio