2019 INTERSPEECH INTERSPEECH 2019

Influence of Speaker-Specific Parameters on Speech Separation Systems

Abstract

Recent studies have shown that Deep Learning based single-channel speech separation systems perform worse for same-gender mixtures than for different-gender mixtures. In this work, we provide for a more detailed analysis of the respective impact of the fundamental frequency and the vocal tract length on the system performance. While both parameters are correlated with gender, the vocal tract length is a fixed speaker-specific parameter, whereas the fundamental frequency can vary for different speaking styles. We show that the difference of the fundamental frequency medians of two speakers in a mixture is highly correlated with the SDR performance while the difference of the vocal tract lengths is not. Our analysis allows us to do performance predictions for given speakers based on measurements of their fundamental frequency. Furthermore we conclude that current systems separate (short-term) speaking styles rather than (long-term) speaker characteristics.

🧭 Keyword Pioneer — speaker-specific parameter
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio