Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis

Mohammed Salah Al-Radhi; Tamás Gábor Csapó; Géza Németh

2017 INTERSPEECH INTERSPEECH 2017

Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis

Abstract

In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis. Previous work has shown the advantages of adding envelope modulated noise to the voiced excitation, but this has not been investigated yet in the context of continuous vocoders, i.e. of which all parameters are continuous. The noise component is often not accurately modeled in modern vocoders (e.g. STRAIGHT). For more natural sounding speech synthesis, four time-domain envelopes (Amplitude, Hilbert, Triangular and True) are investigated and enhanced, and then applied to the noise component of the excitation in our continuous vocoder. The performance evaluation is based on the study of time envelopes. In an objective experiment, we investigated the Phase Distortion Deviation of vocoded samples. A MUSHRA type subjective listening test was also conducted comparing natural and vocoded speech samples. Both experiments have shown that the proposed framework using Hilbert and True envelopes provides high-quality vocoding while outperforming the two other envelopes.

🧭 Keyword Pioneer — excitation modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammed Salah Al-Radhi , Tamás Gábor Csapó , Géza Németh

Topics

Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech synthesis speech enhancement excitation modeling time-domain envelope

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017