2022 INTERSPEECH INTERSPEECH 2022

An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning

Abstract

The rich personal information contained in speech signal can lead to privacy leakage and unfair prediction for speech based technology. In this work, we propose a feature-scoring variational autoencoder (FS-VAE) to handle these issues by performing attribute alignment for speech representation learning. FS-VAE performs attribute alignment by using attention-based scoring machines guided by two additional penalty terms. After obtaining the attribute-aligned representation, we can then choose and mask the nodes containing specific attribute of interest based on the requirement in the downstream tasks. We evaluate our methods on tasks of PP-SER (identity-free emotion recognition) and PP-SV (emotion-less speaker verification). Our proposed method achieves better utility maintenance and competitive privacy protection compared to the most recent attribute-aligned representation learning method.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Security & Privacy and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio