2024 INTERSPEECH INTERSPEECH 2024

A data-driven model of acoustic speech intelligibility for optimization-based models of speech production

Abstract

This paper presents a data-driven model of intelligibility which is intended to be used in an optimization-based model of speech production. The BiLSTM-based model is trained as a phoneme classifier and takes a sequence of real articulatory trajectories as input and returns the probability of phonemes over time. The optimization minimizes a cost function which is the weighted sum of the conflicting demands of being intelligible and least articulatory effort. The data-driven intelligibility model presented in this paper is used to compute the intelligibility score. Simulations support Lindblom's hypo- and hyper-articulation theory of speech, as the degree of hyper-articulation of speech can be modified and tuned along a continuum by balancing the importance given to both requirements of intelligibility and least articulatory effort.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio