Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Themos Stafylakis; Anna Silnova; Johan Rohdin; Oldřich Plchot; Lukas Burget

2024 INTERSPEECH INTERSPEECH 2024

Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Abstract

Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Themos Stafylakis , Anna Silnova , Johan Rohdin , Oldřich Plchot , Lukas Burget

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Bayesian Inference

Keywords

speaker embedding speaker recognition probabilistic framework variational information bottleneck margin loss

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024