Investigating self-supervised speech models' ability to classify animal vocalizations: The case of gibbon's vocal signatures

Jules Cauzinille; Benoit Favre; Ricard Marxer; Dena Clink; Abdul Hamid Ahmad; Arnaud Rey

2024 INTERSPEECH INTERSPEECH 2024

Investigating self-supervised speech models' ability to classify animal vocalizations: The case of gibbon's vocal signatures

Abstract

With the advent of pre-trained self-supervised learning (SSL) models, speech processing research is showing increasing interest towards disentanglement and explainability. Amongst other methods, probing speech classifiers has emerged as a promising approach to gain new insights into SSL models out-of-domain performances. We explore knowledge transfer capabilities of pre-trained speech models with vocalizations from the closest living relatives of humans: non-human primates. We focus on classifying the identity of northern grey gibbons (Hylobates funereus) from their calls with probing and layer-wise analysis of state-of-the-art SSL speech models compared to pre-trained bird species classifiers and audio taggers. By testing the reliance of said models on background noise and timewise information, as well as performance variations across layers, we propose a new understanding of the mechanisms underlying speech models efficacy as bioacoustic tools.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jules Cauzinille , Benoit Favre , Ricard Marxer , Dena Clink , Abdul Hamid Ahmad , Arnaud Rey

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Analysis > Speech Analysis Deep Learning > Techniques > Self-Supervised Learning Deep Learning > Learning Types > Transfer Learning

Keywords

self-supervised learning knowledge transfer speech model probing classifier animal vocalization

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024