Variability of speech timing features across repeated recordings: a comparison of open-source extraction techniques

Judith Dineley; Ewan Carr; Lauren L. White; Catriona Lucas; Zahia Rahman; Tian Pan; Faith Matcham; Johnny Downs; Richard J. Dobson; Thomas F. Quatieri; Nicholas Cummins

2024 INTERSPEECH INTERSPEECH 2024

Variability of speech timing features across repeated recordings: a comparison of open-source extraction techniques

Abstract

Variations in speech timing features have been reliably linked to symptoms of various health conditions, demonstrating clinical potential. However, replication challenges hinder their translation; extracted speech features are susceptible to methodological variations in the recording and processing pipeline. Investigating this, we compared exemplar timing features extracted via three different techniques from recordings of healthy speech. Our results show that features extracted via an intensity-based method differ from those produced by forced alignment. Different extraction methods also led to differing estimates of within-speaker feature variability over time in an analysis of recordings repeated systematically over three sessions in one day (n=26) and in one week (n=28). Our findings highlight the importance of feature extraction in study design and interpretation, and the need for consistent, accurate extraction techniques for clinical research.

🧭 Keyword Pioneer — speech timing feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Judith Dineley , Ewan Carr , Lauren L. White , Catriona Lucas , Zahia Rahman , Tian Pan , Faith Matcham , Johnny Downs , Richard J. Dobson , Thomas F. Quatieri , Nicholas Cummins

Topics

Machine Learning > Core Methods > Regression Machine Learning > Application Areas > Risk Management

Keywords

feature extraction speech analysis forced alignment speaker variability speech timing clinical research within-speaker variability speech timing feature

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024