Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis

Ruibo Fu; Jianhua Tao; Zhengqi Wen; Jiangyan Yi; Chunyu Qiang; Tao Wang

2020 INTERSPEECH INTERSPEECH 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis

Abstract

Most of current end-to-end speech synthesis assumes the input text is in a single language situation. However, code-switching in speech occurs frequently in routine life, in which speakers switch between languages in the same utterance. And building a large mixed-language speech database is difficult and uneconomical. In this paper, both windowing technique and style token modeling are designed for the code-switching end-to-end speech synthesis. To improve the consistency of speaking style in bilingual situation, compared with the conventional windowing techniques that used fixed constraints, the dynamic attention reweighting soft windowing mechanism is proposed to ensure the smooth transition of code-switching. To compensate the shortage of mixed-language training data, the language dependent style token is designed for the cross-language multi-speaker acoustic modeling, where both the Mandarin and English monolingual data are the extended training data set. The attention gating is proposed to adjust style token dynamically based on the language and the attended context information. Experimental results show that proposed methods lead to an improvement on intelligibility, naturalness and similarity.

🧭 Keyword Pioneer — attention gating

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Ruibo Fu , Jianhua Tao , Zhengqi Wen , Jiangyan Yi , Chunyu Qiang , Tao Wang

Topics

Speech & Audio > Synthesis > Text-to-Speech

Keywords

end-to-end speech synthesis attention gating style token code-switching speech synthesis dynamic soft windowing cross-language acoustic modeling

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020