2024 INTERSPEECH INTERSPEECH 2024

A Cluster-based Personalized Federated Learning Strategy for End-to-End ASR of Dementia Patients

Abstract

Automatic speech recognition (ASR) is crucial for all users, but adapting it for Alzheimer’s disease (AD) faces challenges due to irregular speech patterns and privacy concerns. Federated learning (FL), a privacy-preserving algorithm, is a solution. However, FL ASR suffers from acoustic and text heterogeneities. While advanced model-based and cluster-based FL methods aim to address the issue, they lack a direct mechanism for high intra-speaker heterogeneity exhibited by AD individuals and ASR-related properties. This study presents cluster-based personalized federated learning (CPFL), a strategy mitigating heterogeneity by clustering ASR output token using the proposed CharDiv, a metric for pause and word usage distributions. Evaluation on the ADReSS challenge dataset shows a 3.6% improvement in word error rate (WER). Analysis of per-cluster WER improvements and CharDiv distributions indicates reduced heterogeneity, emphasizing pause usage as a potential key factor in AD-oriented ASR.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio