Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis

Peter Birkholz; Susanne Drechsel; Simon Stone

2019 INTERSPEECH INTERSPEECH 2019

Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis

Abstract

We present a geometric vocal fold model that describes the glottal area between the lower and upper vocal fold edges as a function of time. It is based on a glottis model by Titze [J. Acoust. Soc. Am., 75(2), 570–580 (1984)] and has been enhanced to allow the generation of skewed (asymmetric) glottal area waveforms and diplophonic double pulsing. Embedded in the articulatory speech synthesizer VocalTractLab, the model was used for the synthesis of German words with a range of settings for the vocal fold model parameters to generate different male and female voices. A perception experiment was conducted to determine the parameter settings that generate the most natural-sounding voices. The most natural-sounding male voice was generated with a slightly divergent prephonatory glottal shape, with a phase delay of 70° between the lower and upper vocal fold edges, symmetric glottal area pulses, and a little shimmer (double pulsing). The most natural-sounding female voice was generated with a straight prephonatory glottal channel, with a phase delay of 50° between the vocal fold edges, slightly asymmetric glottal area pulses, and a little shimmer.

🧭 Keyword Pioneer — vocal fold model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

🌉 Interdisciplinary Bridge — Interdisciplinary and Speech & Audio

Authors

Peter Birkholz , Susanne Drechsel , Simon Stone

Topics

Speech & Audio > Synthesis > Text-to-Speech Interdisciplinary > Linguistics > Phonetics

Keywords

perceptual optimization speech perception articulatory synthesis voice quality vocal fold model glottal area diplophonic double pulsing articulatory speech synthesis geometric vocal fold model

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019