Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Martin Lebourdais; Théo Mariotte; Antonio Almudévar; Marie Tahon; Alfonso Ortega

2024 INTERSPEECH INTERSPEECH 2024

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Abstract

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high level performances. However, in many domains, among which health or forensics, there is not only a need for good performances but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy ``good'' properties such as informativeness, compactness or modularity, to be interpretable.In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives towards the evaluation of interpretable representations according to 'good' properties.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Martin Lebourdais , Théo Mariotte , Antonio Almudévar , Marie Tahon , Alfonso Ortega

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Autoencoders

Keywords

explainable ai latent representation non-negative matrix factorization audio segmentation

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024