Long Range Acoustic Features for Spoofed Speech Detection
Abstract
Speaker verification systems in practice are vulnerable to spoofing attacks. The high quality recording and playback devices make replay attack a real threat to speaker verification. Additionally, the furtherance in voice conversion and speech synthesis has produced perceptually natural sounding speech. The ASVspoof 2019 challenge is organized to study the robustness of countermeasures against such attacks, which cover two common modes of attacks, logical and physical access. The former deals with synthetic attacks arising from voice conversion and text-to-speech techniques, whereas the latter deals with replay attacks. In this work, we explore several novel countermeasures based on long range acoustic features that are found to be effective for spoofing attack detection. The long range features capture different aspects of long range information as they are computed from subbands and octave power spectrum in contrast to the conventional way from linear power spectrum. These novel features are combined with the other known features for improved detection of spoofing attacks. We obtain a tandem detection cost function of 0.1264 and 0.1381 (equal error rate 4.13% and 5.95%) for logical and physical access on the best combined system submitted to the challenge.