Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations

Guillem Bonafos; Clara Bourot; Pierre Pudlo; Jean-Marc Freyermuth; Laurence Reboul; Samuel Tronçon; Arnaud Rey

2024 INTERSPEECH INTERSPEECH 2024

Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations

Abstract

Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — vocalization clustering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guillem Bonafos , Clara Bourot , Pierre Pudlo , Jean-Marc Freyermuth , Laurence Reboul , Samuel Tronçon , Arnaud Rey

Topics

Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Core Methods > Clustering Machine Learning > Optimization & Theory > Bayesian Inference

Keywords

dirichlet process topological data analysis persistent homology mixture model mel-frequency cepstral coefficient vocalization clustering

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024