A Deep Identity Representation for Noise Robust Spoofing Detection

Alejandro Gómez Alanís; Antonio M. Peinado; Jose A. Gonzalez; Angel Gomez

2018 INTERSPEECH INTERSPEECH 2018

A Deep Identity Representation for Noise Robust Spoofing Detection

Abstract

The issue of the spoofing attacks which may affect automatic speaker verification systems (ASVs) has recently received an increased attention, so that a number of countermeasures have been developed for detecting high technology attacks such as speech synthesis and voice conversion. However, the performance of anti-spoofing systems degrades significantly in noisy conditions. To address this issue, we propose a deep learning framework to extract spoofing identity vectors, as well as the use of soft missing-data masks. The proposed feature extraction employs a convolutional neural network (CNN) plus a recurrent neural network (RNN) in order to provide a single deep feature vector per utterance. Thus, the CNN is treated as a convolutional feature extractor that operates at the frame level. On top of the CNN outputs, the RNN is employed to obtain a single spoofing identity representation of the whole utterance. Experimental evaluation is carried out on both a clean and a noisy version of the ASVSpoof2015 corpus. The experimental results show that our proposals clearly outperforms other methods recently proposed such as the popular CQCC+GMM system or other similar deep feature systems for both seen and unseen noisy conditions.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐣 Hot Topic Early Bird — deep learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — soft missing-data mask

Authors

Alejandro Gómez Alanís , Antonio M. Peinado , Jose A. Gonzalez , Angel Gomez

Topics

Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Biometrics Speech & Audio > Recognition > Speaker Recognition Speech & Audio > Analysis > Speech Analysis Deep Learning > Learning Types > Self-Supervised Learning

Keywords

feature extraction deep learning spoofing detection speaker verification noise robustness convolutional neural network recurrent neural network soft missing-data mask

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018