2021 INTERSPEECH INTERSPEECH 2021

The Effect of Silence and Dual-Band Fusion in Anti-Spoofing System

Abstract

The current neural network based anti-spoofing systems have poor robustness. Their performance degrades further after voice activity detection (VAD) performed, making it difficult to be applied in practice. This work investigated the effect of silence at the beginning and end of speech, finding that silent differences are part of the basis for countermeasures’ judgements. The reason for the performance deterioration caused by VAD is also explored. The experimental results demonstrate that the neural network loses the information about silent segments after the VAD operation removes them. This can lead to more serious overfitting. In order to solve the overfitting problem, the work in this paper also analyzes the reasons for system overfitting from different frequency sub-bands. It is found that the high-frequency part of the feature is the main cause of system overfitting, while the low-frequency part is more robust but less accurate against known attacks. Therefore, we propose the dual-band fusion anti-spoofing algorithm, which requires only two sub-systems but outperforms all but one primary system submitted to the logical access condition of the ASVspoof 2019 challenge. Our system has an EER of 3.50% even after VAD operations performed, thus can be put into practical application.

🧭 Keyword Pioneer — dual-band fusion
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy, Speech & Audio