Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection

Gajan Suthokumar; Kaavya Sriskandaraja; Vidhyasaharan Sethu; Chamith Wijenayake; Eliathamby Ambikairajah

2017 INTERSPEECH INTERSPEECH 2017

Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection

Abstract

Spoofing detection systems for automatic speaker verification have moved from only modelling voiced frames to modelling all speech frames. Unvoiced speech has been shown to carry information about spoofing attacks and anti-spoofing systems may further benefit by treating voiced and unvoiced speech differently. In this paper, we separate speech into low and high energy frames and independently model the distributions of both to form two spoofing detection systems that are then fused at the score level. Experiments conducted on the ASVspoof 2015, BTAS 2016 and Spoofing and Anti-Spoofing (SAS) corpora demonstrate that the proposed approach of fusing two independent high and low energy spoofing detection systems consistently outperforms the standard approach that does not distinguish between high and low energy frames.

🧭 Keyword Pioneer — energy frame modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Gajan Suthokumar , Kaavya Sriskandaraja , Vidhyasaharan Sethu , Chamith Wijenayake , Eliathamby Ambikairajah

Topics

Speech & Audio > Analysis > Speaker Verification

Keywords

spoofing detection speaker verification model fusion energy frame modeling voice unvoiced speech

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017