2021 INTERSPEECH INTERSPEECH 2021

Perception of Standard Arabic Synthetic Speech Rate

Abstract

This experiment investigated how Arabic speakers perceive synthetic Standard Arabic speech rate produced by Google TTS, at normal vs. accelerated rates. Twenty syntactically identical Standard Arabic sentences with a similar length (M= 22 syllables per sentence, SD= 1) were auditorily presented in a female voice to thirty female participants who were instructed to rate the tempo of the normal (M≈ 4.5 syllable per second) and accelerated (by 10%, 20%, and 30%) stimuli on a 1–7 Likert scale (1= extremely slow, 4= normal, 7= extremely fast). The results show that differences in the four-condition synthetic speech rates were reflected in the ratings provided by the participants: the more the speech was accelerated, the higher rating it received. More importantly, the findings support the observation that the current normal speech rate of Google TTS synthetic speech is not perceived as normal by Arabic speakers, but rather is perceived as slow. This may negatively affect the likelihood that users are comfortable using this technology. Hence, the outcome of this study does not only call for further investigation into Standard Arabic synthetic speech rates, but also reveals the need to define a baseline for a natural speech rate in Arabic.

🧭 Keyword Pioneer — synthetic speech
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Interdisciplinary and Speech & Audio