Combining Audio and Brain Activity for Predicting Speech Quality

Ivan Halim Parmonangan; Hiroki Tanaka; Sakriani Sakti; Satoshi Nakamura

2020 INTERSPEECH INTERSPEECH 2020

Combining Audio and Brain Activity for Predicting Speech Quality

Abstract

Since the perceived audio quality of the synthesized speech may determine a system’s market success, quality evaluations are critical. Audio quality evaluations are usually done in either subjectively or objectively. Due to their costly and time-consuming nature, the subjective approaches have generally been replaced by the faster, more cost-efficient objective approaches. The primary downside of the objective approaches primarily is that they lack the human influence factors which are crucial for deriving the subjective perception of quality. However, it cannot be observed directly and manifested in individual brain activity. Thus, we combined predictions from single-subject electroencephalograph (EEG) information and audio features to improve the predictions of the overall quality of synthesized speech. Our result shows that by combining the results from both audio and EEG models, a very simple neural network can surpass the performance of the single-modal approach.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ivan Halim Parmonangan , Hiroki Tanaka , Sakriani Sakti , Satoshi Nakamura

Topics

Machine Learning > Application Areas > Domain Adaptation

Keywords

speech synthesis multimodal learning speech quality audio feature

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020