End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances

Chunlei Zhang; Kazuhito Koishida

2017 INTERSPEECH INTERSPEECH 2017

End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances

Abstract

Text-independent speaker verification against short utterances is still challenging despite of recent advances in the field of speaker recognition with i-vector framework. In general, to get a robust i-vector representation, a satisfying amount of data is needed in the MAP adaptation step, which is hard to meet under short duration constraint. To overcome this, we present an end-to-end system which directly learns a mapping from speech features to a compact fixed length speaker discriminative embedding where the Euclidean distance is employed for measuring similarity within trials. To learn the feature mapping, a modified Inception Net with residual block is proposed to optimize the triplet loss function. The input of our end-to-end system is a fixed length spectrogram converted from an arbitrary length utterance. Experiments show that our system consistently outperforms a conventional i-vector system on short duration speaker verification tasks. To test the limit under various duration conditions, we also demonstrate how our end-to-end system behaves with different duration from 2s–4s.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — speaker embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chunlei Zhang , Kazuhito Koishida

Topics

Machine Learning > Core Methods > Embedding Learning Speech & Audio > Analysis > Speaker Verification

Keywords

deep learning speaker embedding speaker verification triplet loss text-independent verification short utterance

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017