SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms

Paul Yaozhu Chan; Minghui Dong; Grace Xue Hui Ho; Haizhou Li

2016 INTERSPEECH INTERSPEECH 2016

SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms

Abstract

Singing synthesis is a rising musical art form gaining popularity amongst composers and end-listeners alike. To date, this art form is largely confined to offline boundaries of the music studio, whereas a large part music is about live performances. This calls for a real-time synthesis system readily deployable for onstage applications. SERAPHIM is a wavetable synthesis system that is lightweight and deployable on mobile platforms. Apart from conventional offline studio applications, SERAPHIM also supports real-time synthesis applications, enabling live control inputs for on-stage performances. It also provides for easy lip animation control. SERAPHIM will be made available as a toolbox on Unity 3D for easy adoption into game development across multiple platforms. A readily compiled version will also be deployed as a VST studio plugin, directly addressing end users. It currently supports Japanese (singing only) and Mandarin (speech and singing) languages. This paper describes our work on SERAPHIM and discusses its capabilities and applications.

🚀 Conference Pioneer — INTERSPEECH 2016

🧭 Keyword Pioneer — singing synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Paul Yaozhu Chan , Minghui Dong , Grace Xue Hui Ho , Haizhou Li

Keywords

speech synthesis mobile deployment singing synthesis wavetable synthesis lip animation real-time synthesis mobile platform 3d lip animation

Download PDF

A Feature Study for Masking-Based Reverberant Speech Separation 2016

Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling 2016

Today’s Most Frequently Used F0Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech 2016

A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions 2016

SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms

Abstract

Authors

Keywords

Related papers