The CLIPS System for 2022 Spoofing-Aware Speaker Verification Challenge

Jucai Lin; Tingwei Chen; Jingbiao Huang; Ruidong Fang; Jun Yin; Yuanping Yin; Wei Shi; Weizhen Huang; Yapeng Mao

2022 INTERSPEECH INTERSPEECH 2022

The CLIPS System for 2022 Spoofing-Aware Speaker Verification Challenge

Abstract

In this paper, a spoofing-aware speaker verification (SASV) system that integrates the automatic speaker verification (ASV) system and countermeasure (CM) system is developed. Firstly, a modified re-parameterized VGG (ARepVGG) module is utilized to extract high-level representation from the multi-scale feature that learns from the raw waveform though sinc-filters, and then a spectra-temporal graph attention network is used to learn the final decision information whether the audio is spoofed or not. Secondly, a new network that is inspired from the MaxFeature-Map (MFM) layers is constructed to fine-tune the CM system while keeping the ASV system fixed. Our proposed SASV system significantly improves the SASV equal error rate (SASV-EER) from 6.73% to 1.36% on the evaluation dataset and 4.85% to 0.98% on the development dataset in the 2022 Spoofing-Aware Speaker Verification Challenge(2022 SASV).

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Jucai Lin , Tingwei Chen , Jingbiao Huang , Ruidong Fang , Jun Yin , Yuanping Yin , Wei Shi , Weizhen Huang , Yapeng Mao

Topics

Machine Learning > Learning Types > Adversarial Learning Speech & Audio > Recognition > Speaker Recognition Speech & Audio > Analysis > Speaker Verification Security & Privacy > Privacy Speech & Audio > Analysis > Speech Analysis

Keywords

spoofing detection speaker verification graph attention network multi-scale feature countermeasure system

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022