Phoneme-Discriminative Features for Dysarthric Speech Conversion

Ryo Aihara; Tetsuya Takiguchi; Yasuo Ariki

2017 INTERSPEECH INTERSPEECH 2017

Phoneme-Discriminative Features for Dysarthric Speech Conversion

Abstract

We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least Square (PLS)-based VC has been proposed to prevent the over-fitting problems associated with the GMM-based VC method. In this paper, we present phoneme-discriminative features, which are associated with PLS-based VC. Conventional VC methods do not consider the phonetic structure of spectral features although phonetic structures are important for speech analysis. Especially for dysarthric speech, their phonetic structures are difficult to discriminate and discriminative learning will improve the conversion accuracy. This paper employs discriminative manifold learning. Spectral features are projected into a subspace in which a near point with the same phoneme label is close to another and a near point with a different phoneme label is apart. Our proposed method was evaluated on dysarthric speaker conversion task which converts dysarthric voice into non-dysarthric speech.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — phoneme discriminative feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ryo Aihara , Tetsuya Takiguchi , Yasuo Ariki

Topics

Machine Learning > Core Methods > Classification Speech & Audio > Synthesis > Text-to-Speech

Keywords

manifold learning voice conversion gaussian mixture model partial least square dysarthric speech phoneme discriminative feature

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017