Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection

Haochen Wu; Wu Guo; Shengyu Peng; Zhuhai Li; Jie Zhang

2024 INTERSPEECH INTERSPEECH 2024

Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection

Abstract

Speech anti-spoofing models can be improved by using large pre-trained model as front-end, e.g., Wav2vec2 or WavLM. However, apart from the heavy computation overhead, fine-tuning of pre-trained model is prone to over-fitting and catastrophic forgetting due to limited training data. In this paper, we propose an novel adapter learning framework based on pre-trained model for robust spoof speech detection. We consider two adapter cases, i.e., intra-block adapters and cross-block adapters, which are inserted or appended to the backbone Wav2vec2. The parameters of the adapters are updated by freezing the backbone during training. The local-global task-dependent information for spoof speech detection is obtained via the proposed adapter learning with a marginal increase of parameters. Results on three benchmark datasets validate the superiority over the baseline and existing SOTA systems.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — spoof speech detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Haochen Wu , Wu Guo , Shengyu Peng , Zhuhai Li , Jie Zhang

Topics

Machine Learning > Application Areas > Efficient Computing Speech & Audio > Analysis > Speaker Verification

Keywords

catastrophic forgetting speaker verification pre-trained model adapter learning spoof speech detection

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024