Half-Truth: A Partially Fake Audio Detection Dataset

Jiangyan Yi; Ye Bai; Jianhua Tao; Haoxin Ma; Zhengkun Tian; Chenglong Wang; Tao Wang; Ruibo Fu

2021 INTERSPEECH INTERSPEECH 2021

Half-Truth: A Partially Fake Audio Detection Dataset

Abstract

Diverse promising datasets have been designed to further the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance. The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake utterances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than fully fake audio for fake audio detection.

🧭 Keyword Pioneer — partially fake audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Speech & Audio

Authors

Jiangyan Yi , Ye Bai , Jianhua Tao , Haoxin Ma , Zhengkun Tian , Chenglong Wang , Tao Wang , Ruibo Fu

Topics

Speech & Audio > Analysis > Speaker Verification

Keywords

audio deepfake detection fake audio detection partially fake audio speech manipulation detection audio localization

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021