Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features

Shaoxiang Dang; Tetsuya Matsumoto; Yoshinori Takeuchi; Takashi Tsuboi; Yasuhiro Tanaka; Daisuke Nakatsubo; Satoshi Maesawa; Ryuta Saito; Masahisa Katsuno; Hiroaki Kudo

2024 INTERSPEECH INTERSPEECH 2024

Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features

Abstract

The potential of deep learning in clinical speech processing is immense, yet the hurdles of limited and imbalanced clinical data samples loom large. This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems. Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease before and after undergoing subthalamic nucleus deep brain stimulation (STN-DBS) surgery. The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (<0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators. Meanwhile, progress has been achieved in predicting the voice quality of patients in the context of STN-DBS.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — vocal impairment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Shaoxiang Dang , Tetsuya Matsumoto , Yoshinori Takeuchi , Takashi Tsuboi , Yasuhiro Tanaka , Daisuke Nakatsubo , Satoshi Maesawa , Ryuta Saito , Masahisa Katsuno , Hiroaki Kudo

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Models > Generative Models Speech & Audio > Analysis > Clinical Speech Analysis

Keywords

self-supervised learning speech representation voice quality assessment vocal impairment clinical speech processing

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024