NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion

Chitralekha Gupta; Karthika Vijayan; Bidisha Sharma; Xiaoxue Gao; Haizhou Li

2019 INTERSPEECH INTERSPEECH 2019

NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion

Abstract

Singing like a professional singer is extremely appealing to the general public. However, many individuals are not able to sing like a singer who has received formal training over several years. We develop a web platform, where users can perform personalized singing synthesis. A user has to read and record the lyrics of a song in our web platform, and enjoy good quality singing vocals synthesized in his/her own voice. We perform a template-based speech-to-singing voice conversion at the backend of the web interface, that uses the prosody characteristics of the song derived from good quality singing by a trained singer and retains the speaker characteristics from the respective user. We utilize an improved temporal alignment scheme between speech and singing signals using tandem features, and employ a deep-spectral map to incorporate singing spectral characteristics into user’s voice. The singing vocals are later synthesized by a vocoder. Using this web platform, we advocate that ‘everyone can sing as they desire’.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — personalized synthesis

🐣 Hot Topic Early Bird — temporal alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Chitralekha Gupta , Karthika Vijayan , Bidisha Sharma , Xiaoxue Gao , Haizhou Li

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Synthesis > Speech Synthesis

Keywords

voice conversion temporal alignment spectral mapping personalized synthesis speech to singing conversion prosody characteristics

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019