2024 INTERSPEECH INTERSPEECH 2024

Zero-shot Out-of-domain is No Joke: Lessons Learned in the VoiceMOS 2023 MOS Prediction Challenge

Abstract

This paper describes our team’s experiences in the VoiceMOS Challenge 2023 - a challenge centered around the evaluation of the quality of synthetic or noisy speech. Inspired by our success with an ensemble approach in the first VoiceMOS Challenge in 2022, we submitted an ensemble of four models this time, based on wav2vec 2.0, QuartzNet, CNN-RNN, and LDNet. This was enough to win one of the two tracks we participated in (Track 1b). However, post-challenge analysis shows that only two of the models offer a meaningful contribution in any of the VoiceMOS 2023 tracks, while the other two only degrade the ensemble’s overall performance. On the other hand, post-challenge results on Track 2 (singing voice conversion data) surpassed all our expectations. In the paper, we explain how we tried to deal with the new zero-shot out-of-domain scenarios, analyze the results, and discuss the lessons learned.

🧭 Keyword Pioneer — speaker quality prediction
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio