Influence of Speaker-Specific Parameters on Speech Separation Systems

David Ditter; Timo Gerkmann

2019 INTERSPEECH INTERSPEECH 2019

Influence of Speaker-Specific Parameters on Speech Separation Systems

Abstract

Recent studies have shown that Deep Learning based single-channel speech separation systems perform worse for same-gender mixtures than for different-gender mixtures. In this work, we provide for a more detailed analysis of the respective impact of the fundamental frequency and the vocal tract length on the system performance. While both parameters are correlated with gender, the vocal tract length is a fixed speaker-specific parameter, whereas the fundamental frequency can vary for different speaking styles. We show that the difference of the fundamental frequency medians of two speakers in a mixture is highly correlated with the SDR performance while the difference of the vocal tract lengths is not. Our analysis allows us to do performance predictions for given speakers based on measurements of their fundamental frequency. Furthermore we conclude that current systems separate (short-term) speaking styles rather than (long-term) speaker characteristics.

🧭 Keyword Pioneer — speaker-specific parameter

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

David Ditter , Timo Gerkmann

Topics

Deep Learning > Architectures > Neural Networks

Keywords

speech separation deep learning fundamental frequency speaker-specific parameter

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019