2019 INTERSPEECH INTERSPEECH 2019

Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems

Abstract

Currently, most Speaker Verification (SV) systems based on neural networks use Cross-Entropy and/or Triplet loss functions. Despite these functions provide competitive results, they might not fully exploit the system performance, because they are not designed to optimize the verification task considering the performance measures, e.g. the Detection Cost Function (DCF) or the Equal Error Rate (EER). This paper proposes a first approach to this issue through the optimization of a loss function based on the DCF. This mechanism allows the end-to-end system to directly manage the threshold used to compute the ratio between the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). This way connecting the system training directly to the operating point. Results in a text-dependent speaker verification framework, based on neural network super-vectors over the RSR2015 dataset, outperform reference systems using Cross-Entropy and Triplet loss, as well as our previously proposal based on an approximation of the Area Under the Curve ( aAUC).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — false rejection rate
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio