fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit

Changhan Wang; Wei-Ning Hsu; Yossi Adi; Adam Polyak; Ann Lee; Peng-Jen Chen; Jiatao Gu; Juan Pino

2021 EMNLP EMNLP 2021

fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit

Abstract

AbstractThis paper presents fairseq Sˆ2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, fairseq Sˆ2 also benefits from the scalability offered by fairseq and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models will be made available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — multi-speaker model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Changhan Wang , Wei-Ning Hsu , Yossi Adi , Adam Polyak , Ann Lee , Peng-Jen Chen , Jiatao Gu , Juan Pino

Topics

Speech & Audio > Synthesis > Text-to-Speech Deep Learning > Models > Transformers Deep Learning > Learning Types > Generative Models Speech & Audio > Synthesis > Speech Synthesis

Keywords

speech synthesis autoregressive model pretrained model multi-speaker model neural network

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021