2024 INTERSPEECH INTERSPEECH 2024

HarmoNet: Partial DeepFake Detection Network based on Multi-scale HarmoF0 Feature Fusion

Abstract

Audio DeepFake detection (ADD) has become an increasingly challenging task recently, with the rise of various spoofing attacks utilizing artificially generated audio. The track 2 of ADD 2023 requires not only detecting DeepFake audio but also locating the manipulated regions. To tackle this unique challenge, we have proposed an innovative framework HarmoNet that leverages the Multi-scale harmonic F0 and Wav2Vec features with attention mechanism. This allows the model to effectively capture changes in each region of the utterance. Furthermore, we have introduced a new loss function named Partial Loss, which focuses more on the boundary between real and fake region. Additionally, we have designed a post-processor to refine the output of the model. Our framework achieved 70.61% in track 2 of ADD 2023, an improvement of 67.12% over baseline, and achieved the best performance. Moreover, HarmoNet also shows competitive performance on other DeepFake datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning
🧭 Keyword Pioneer — harmonic f0
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio