Quantifying Unintended Memorization in BEST-RQ ASR Encoders

Virat Shejwalkar; Om Thakkar; Arun Narayanan

2024 INTERSPEECH INTERSPEECH 2024

Quantifying Unintended Memorization in BEST-RQ ASR Encoders

Abstract

Self-supervised ASR encoders are increasingly being adopted in real-world applications as they enable downstream ASR tasks with impressive performances. This raises concerns around privacy of the data used to train such encoders, especially since neural networks are known to unintentionally memorize rare/unique samples from their training data. To this end, we perform the first systematic auditing of unintended memorization in ASR encoders. Specifically, we focus on a state-of-the-art Conformer-based ASR encoder pre-trained using the BEST-RQ technique, which forms the foundation of many real-world ASR applications. We propose a novel auditing method that can successfully demonstrate such memorization in ASR encoders, even for samples occurring just once in their training data. Finally, we show the promise of pre-training with per-sample gradient clipping towards mitigating such memorization in ASR encoders without significantly impacting downstream model quality.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Virat Shejwalkar , Om Thakkar , Arun Narayanan

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Machine Learning > Application Areas > Privacy Deep Learning > Techniques > Pretraining

Keywords

self-supervised learning speech recognition gradient clipping privacy auditing unintended memorization conformer encoder

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024