M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus

Lichao Zhang; Ruiqi Li; Shoutong Wang; Liqun Deng; Jinglin Liu; Yi Ren; Jinzheng He; Rongjie Huang; jieming zhu; Xiao Chen; Zhou Zhao

2022 NIPS NeurIPS 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus

Abstract

The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present M4Singer, a free-to-use Multi-style, Multi-singer Mandarin singing collection with elaborately annotated Musical scores as well as its benchmarks. Specifically, 1) we construct and release a large high-quality Chinese singing voice corpus, which is recorded by 20 professional singers, covering 700 Chinese pop songs as well as all the four SATB types (i.e., soprano, alto, tenor, and bass); 2) we take extensive efforts to manually compose the musical scores for each recorded song, which are necessary to the study of the prosody modeling for SVS. 3) To facilitate the use and demonstrate the quality of M4Singer, we conduct four different benchmark experiments: score-based SVS, controllable singing voice (CSV), singing voice conversion (SVC) and automatic music transcription (AMT).

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Speech & Audio

🧭 Keyword Pioneer — music corpus

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Lichao Zhang , Ruiqi Li , Shoutong Wang , Liqun Deng , Jinglin Liu , Yi Ren , Jinzheng He , Rongjie Huang , jieming zhu , Xiao Chen , Zhou Zhao

Topics

Interdisciplinary > Science > Digital Humanities Deep Learning > Learning Types > Multimodal Learning Speech & Audio > Synthesis > Speech Synthesis

Keywords

voice conversion singing voice synthesis music corpus automatic music transcription score-based singing

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022