2023
INTERSPEECH
INTERSPEECH 2023
Controlling formant frequencies with neural text-to-speech for the manipulation of perceived speaker age
Abstract
In this paper, we present a framework for formant-controllable neural text-to-speech. We train a model that predicts formant frequencies which then condition melspectrogram generation. We apply this to manipulate perceived speaker age in an indirect fashion, by modifying the predicted formants in a manner that affects perceived vocal tract length. Our ultimate goal is to allow for the control of perceived ageing in children's text-to-speech voices, since ageing in natural child speech is strongly linked to the growth of a child's vocal tract. However, our experiments indicate that our method shows strong age control capabilities for adult speech as well.
🧭
Keyword Pioneer
— voice manipulation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Speech & Audio