Training Utterance-level Embedding Networks for Speaker Identification and Verification

Heewoong Park; Sukhyun Cho; Kyubyong Park; Namju Kim; Jonghun Park

2018 INTERSPEECH INTERSPEECH 2018

Training Utterance-level Embedding Networks for Speaker Identification and Verification

Abstract

Encoding speaker-specific characteristics from speech signals into fixed length vectors is a key component of speaker identification and verification systems. This paper presents a deep neural network architecture for speaker embedding models where similarity in embedded utterance vectors explicitly approximates the similarity in vocal patterns of speakers. The proposed architecture contains an additional speaker embedding lookup table to compute loss based on embedding similarities. Furthermore, we propose a new feature sampling method for data augmentation. Experimentation based on two databases demonstrates that our model is more effective at speaker identification and verification when compared to a fully connected classifier and an end-to-end verification model.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — data augmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — utterance embedding

Authors

Heewoong Park , Sukhyun Cho , Kyubyong Park , Namju Kim , Jonghun Park

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Core Methods > Embedding Learning Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition

Keywords

metric learning embedding learning data augmentation speaker embedding speaker verification speaker identification utterance-level embedding neural network utterance embedding

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018