Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Yuxuan Wang; Daisy Stanton; Yu Zhang; RJ-Skerry Ryan; Eric Battenberg; Joel Shor; Ying Xiao; Ye Jia; Fei Ren; Rif A. Saurous

2018 ICML ICML 2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Abstract

In this work, we propose “global style tokens” (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable “labels” they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire long-form text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuxuan Wang , Daisy Stanton , Yu Zhang , RJ-Skerry Ryan , Eric Battenberg , Joel Shor , Ying Xiao , Ye Jia , Fei Ren , Rif A. Saurous

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Synthesis > Text-to-Speech

Keywords

style transfer embedding learning self-supervised learning speech synthesis

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018