Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

Jiatong Shi; Yueqian Lin; Xinyi Bai; Keyi Zhang; Yuning Wu; Yuxun Tang; Yifeng Yu; Qin Jin; Shinji Watanabe

2024 INTERSPEECH INTERSPEECH 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

Abstract

In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatural voice synthesis. This innovative method has led to the creation of two expansive singing voice datasets, ACE-Opencpop and ACE-KiSing, which are instrumental for large-scale, multi-singer voice synthesis. Through thorough experimentation, we establish that these datasets not only serve as new benchmarks for SVS but also enhance SVS performance on other singing voice datasets when used as supplementary resources. The corpora, pre-trained models, and their related training recipes are publicly available at ESPnet-Muskits (https://github.com/espnet/espnet).

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiatong Shi , Yueqian Lin , Xinyi Bai , Keyi Zhang , Yuning Wu , Yuxun Tang , Yifeng Yu , Qin Jin , Shinji Watanabe

Topics

Machine Learning > Application Areas > Data Augmentation

Keywords

data augmentation singing voice synthesis music processing

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024