Siamese Network with wav2vec Feature for Spoofing Speech Detection

Yang Xie; Zhenchuan Zhang; Yingchun Yang

2021 INTERSPEECH INTERSPEECH 2021

Siamese Network with wav2vec Feature for Spoofing Speech Detection

Abstract

Automatic speaker verification is vulnerable to spoofing attacks with synthesized or converted speech. Although high-performance anti-spoofing countermeasures can achieve high accuracy when the training and testing spoofing attack examples are similarly distributed, their performance degrades significantly when confronted with out-of-distribution spoofing speech, which is created by increasingly advanced unseen speech synthesis and voice conversion methods. Since it is unrealistic to collect enough labeled data from each new spoofing attack method, we argue that addressing the problem of out-of-distribution generalization for spoofing speech detection is essential. In this work, we propose a two-phase representation learning system based on a Siamese network for spoofing speech detection tasks. During the representation learning phase, an embedding Siamese neural network is trained with the wav2vec features to distinguish whether the speech samples in a pair belong to the same category. The proposed system decreases the equal error rate from the state-of-the-art result of 4.07% to 1.15% on the ASVspoof 2019 evaluation set.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — spoofing speech detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Xie , Zhenchuan Zhang , Yingchun Yang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Analysis > Speaker Verification Speech & Audio > Analysis > Speech Analysis Deep Learning > Learning Types > Representation Learning

Keywords

representation learning out-of-distribution generalization spoofing detection speaker verification siamese network equal error rate wav2vec feature spoofing speech detection

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021