Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification

Hee-soo Heo; Jee-weon Jung; IL-ho Yang; Sung-hyun Yoon; Ha-Jin Yu

2017 INTERSPEECH INTERSPEECH 2017

Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification

Abstract

We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterance-level features, and trial-level scoring in an expanded end-to-end DNN architecture. The main contribution of this paper is that, instead of using DNNs as parts of the system trained independently, we train the whole system jointly with a fine-tune cost after pre-training each part. The experimental results show that the proposed system outperforms the baseline d-vector system and i-vector PLDA system.

🐣 Hot Topic Early Bird — joint training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hee-soo Heo , Jee-weon Jung , IL-ho Yang , Sung-hyun Yoon , Ha-Jin Yu

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Neural Network Optimization

Keywords

speaker verification deep neural network end-to-end learning joint training

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017