A data-driven model of acoustic speech intelligibility for optimization-based models of speech production

Benjamin Elie; Juraj Simko; Alice Turk

2024 INTERSPEECH INTERSPEECH 2024

A data-driven model of acoustic speech intelligibility for optimization-based models of speech production

Abstract

This paper presents a data-driven model of intelligibility which is intended to be used in an optimization-based model of speech production. The BiLSTM-based model is trained as a phoneme classifier and takes a sequence of real articulatory trajectories as input and returns the probability of phonemes over time. The optimization minimizes a cost function which is the weighted sum of the conflicting demands of being intelligible and least articulatory effort. The data-driven intelligibility model presented in this paper is used to compute the intelligibility score. Simulations support Lindblom's hypo- and hyper-articulation theory of speech, as the degree of hyper-articulation of speech can be modified and tuned along a continuum by balancing the importance given to both requirements of intelligibility and least articulatory effort.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Benjamin Elie , Juraj Simko , Alice Turk

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization

Keywords

phoneme classification speech intelligibility optimization algorithm long short-term memory speech production articulatory trajectory

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024