NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion
Abstract
Singing like a professional singer is extremely appealing to the general public. However, many individuals are not able to sing like a singer who has received formal training over several years. We develop a web platform, where users can perform personalized singing synthesis. A user has to read and record the lyrics of a song in our web platform, and enjoy good quality singing vocals synthesized in his/her own voice. We perform a template-based speech-to-singing voice conversion at the backend of the web interface, that uses the prosody characteristics of the song derived from good quality singing by a trained singer and retains the speaker characteristics from the respective user. We utilize an improved temporal alignment scheme between speech and singing signals using tandem features, and employ a deep-spectral map to incorporate singing spectral characteristics into user’s voice. The singing vocals are later synthesized by a vocoder. Using this web platform, we advocate that ‘everyone can sing as they desire’.