2023 INTERSPEECH INTERSPEECH 2023

Group GMM-ResNet for Detection of Synthetic Speech Attacks

Abstract

The CNN-based models have achieved a remarkable success for speaker recognition and spoofing speech detection. We propose the group GMM-ResNet for synthesis speech detection. The grouping technique is used to improve classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The grouping technique allows the model to jointly attend to information from different representation subspaces. We propose two grouping methods, which are based on the Gaussian components in GMM. And the GMM is trained using binary splitting method. On the ASVspoof 2021 LA task, the group GMM-ResNet achieves a minimum t-DCF of 0.2450 and an EER of 2.53%, which relatively reduces by 28.9% and 72.7% compared with the LFCC-LCNN baseline. On the ASVspoof 2021 DF task, the group GMM-ResNet achieves an EER of 15.96%, which relatively reduces by 28.7% compared with the RawNet2 baseline.

🧭 Keyword Pioneer — group architecture
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio