2020 INTERSPEECH INTERSPEECH 2020

Combining Audio and Brain Activity for Predicting Speech Quality

Abstract

Since the perceived audio quality of the synthesized speech may determine a system’s market success, quality evaluations are critical. Audio quality evaluations are usually done in either subjectively or objectively. Due to their costly and time-consuming nature, the subjective approaches have generally been replaced by the faster, more cost-efficient objective approaches. The primary downside of the objective approaches primarily is that they lack the human influence factors which are crucial for deriving the subjective perception of quality. However, it cannot be observed directly and manifested in individual brain activity. Thus, we combined predictions from single-subject electroencephalograph (EEG) information and audio features to improve the predictions of the overall quality of synthesized speech. Our result shows that by combining the results from both audio and EEG models, a very simple neural network can surpass the performance of the single-modal approach.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio