2023 INTERSPEECH INTERSPEECH 2023

Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams

Abstract

This work presents a novel deep generative model for unsupervised learning of sparse binary feature representations with data-adaptive dimensionality directly from continuous speech streams. Sharing the critical assumption of unbounded latent dimensionality with previously proposed Bayesian non-parametric approaches, our proposed model can capture the much richer, non-Markovian dependencies between its latent representations. The present work focuses on an investigation of our proposed model's performance in learning linguistically meaningful representations under challenging, realistic scenarios. We train our model with highly speaker-imbalanced datasets and evaluate it on the ABX phone discriminability test. Our model achieves a promising, competitive performance to the state-of-the-art model, despite its huge disadvantage: limited or no access to speaker information during training.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — data-adaptive dimensionality
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio