2021 INTERSPEECH INTERSPEECH 2021

Automatic Severity Classification of Korean Dysarthric Speech Using Phoneme-Level Pronunciation Features

Abstract

This paper proposes an automatic severity classification method for Korean dysarthric speech by using two types of phoneme-level pronunciation features. The first type is the percentage of correct phonemes, which consists of percentage of correct consonants, percentage of correct vowels, and percentage of total correct phonemes. The second type is related to the degree of vowel distortion, such as vowel space area, formant centralized ratio, vowel articulatory index, and F2-ratio. The baseline experiments use features from our previous study, consisting of MFCCs, voice quality features, and prosody features. Compared to the baseline, experiments including phoneme-level pronunciation features achieve a relative percentage increase of 32.55% and 33.84% in F1-score for support vector machine and feed-forward neural network classifiers, respectively. Our best performance reaches an F1-score of 77.38%, which is a relative percentage increase of 10.39% compared to the best previous results conducted on the Korean dysarthric QoLT corpus. Furthermore, with feature selection applied, all seven phoneme-level pronunciation features are chosen, accounting for the highest percentage of the selected feature set by both recursive feature elimination and extra trees classifier feature selection algorithms. Results indicate that phoneme-level pronunciation features are useful in enhancing the performance for automatic severity classification of dysarthric speech.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — pronunciation feature
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio