Angular Softmax for Short-Duration Text-independent Speaker Verification

Zili Huang; Shuai Wang; Kai Yu

2018 INTERSPEECH INTERSPEECH 2018

Angular Softmax for Short-Duration Text-independent Speaker Verification

Abstract

Recently, researchers propose to build deep learning based end-to-end speaker verification (SV) systems and achieve competitive results compared with the standard i-vector approach. In addition to deep learning architectures, optimization metric, such as softmax loss or triplet loss, is important for extracting speaker embeddings which are discriminative and generalizable to unseen speakers. In this paper, angular softmax (A-softmax) loss is introduced to improve speaker embedding quality. It is investigated in two SV frameworks: a CNN based end-to-end SV framework and an i-vector SV framework where deep discriminant analysis is used for channel compensation. Experimental results on a short-duration text-independent speaker verification dataset generated from SRE reveal that A-softmax achieves significant performance improvement compared with other metrics in both frameworks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

📈 Trend Setter — Loss Functions

🧭 Keyword Pioneer — angular softmax

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — triplet loss

Authors

Zili Huang , Shuai Wang , Kai Yu

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Loss Functions Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition Speech & Audio > Analysis > Speaker Verification Machine Learning > Learning Types > Deep Learning Machine Learning > Learning Types > Classification Deep Learning > Learning Types > Deep Learning Deep Learning > Learning Types > Representation Learning

Keywords

speaker embedding speaker verification discriminant analysis convolutional neural network end-to-end learning triplet loss angular softmax embedding quality softmax loss short-duration verification

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018