2020 INTERSPEECH INTERSPEECH 2020

Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge

Abstract

In this paper we describe a novel replay detection system for the ASVspoof 2019 challenge. The objective of this challenge is to distinguish arbitrarily audio files from bona fide or spoofing attacks, where spoofing attacking includes replay attacks, text-to-speech and voice conversions. Our replay detection system is a pipeline system with three aspects: feature engineering, DNN models, and score fusion. Firstly, logspec is extracted as input features according to previous research works where spectrum augmentation is applied during training stage to boost performance under limited training data. Secondly, DNN models part includes three major models: SEnet, DenseNet, and our proposed model, channel consistency DenseNeXt, where binary cross entropy loss and center loss are applied as training objectives. Finally, score fusion is applied to all three DNN models in order to obtain primary system results. The experiment results show that for our best single system, channel consistency DenseNeXt, t-DCF and EER are 0.0137 and 0.46% on physical access evaluation set respectively. The performance of primary system obtains 0.00785 and 0.282% in terms of t-DCF and EER respectively. This is a 96.8% improvement compared to the baseline system CQCC-GMM and it achieves state-of-the-art performance in PA challenge.

🧭 Keyword Pioneer — audio anti-spoofing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Natural Language Processing, Security & Privacy, Speech & Audio