Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy

Bo-Hao Su; Chi-Chun Lee

2022 INTERSPEECH INTERSPEECH 2022

Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy

Abstract

Speech emotion recognition (SER) is being actively developed in multiple real-world application scenarios, and users tend to become intimately connected to these services. However, most existing SER models are vulnerable against a growing diverse set of adversarial attacks. The degraded performances can lead to dreadful user experiences. In this work, we propose a self-supervised augmentation defense (SSAD) strategy to learn a single purify network acts as a general front-end to neutralize adversarial distortions without knowing the types of attack beforehand. We show that our approach can robustly defend against two different gradient-based attacks at various intensities on the well-known IEMOCAP. Further, by examining metrics of protection efficacy and recovery rate, our approach shows a consistent protection behavior to prevent adverse outcomes and is capable to recover samples that are wrongly-predicted before purification.

🧭 Keyword Pioneer — robustness defense

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

Authors

Bo-Hao Su , Chi-Chun Lee

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Synthesis > Speech Enhancement Deep Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Adversarial Learning

Keywords

self-supervised learning adversarial attack gradient-based attack speech emotion recognition robustness defense sample purification speech purification

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022