Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery

Thomas Glarner; Patrick Hanebrink; Janek Ebbers; Reinhold Haeb-Umbach

2018 INTERSPEECH INTERSPEECH 2018

Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery

Abstract

The invention of the Variational Autoencoder enables the application of Neural Networks to a wide range of tasks in unsupervised learning, including the field of Acoustic Unit Discovery (AUD). The recently proposed Hidden Markov Model Variational Autoencoder (HMMVAE) allows a joint training of a neural network based feature extractor and a structured prior for the latent space given by a Hidden Markov Model. It has been shown that the HMMVAE significantly outperforms pure GMM-HMM based systems on the AUD task. However, the HMMVAE cannot autonomously infer the number of acoustic units and thus relies on the GMM-HMM system for initialization. This paper introduces the Bayesian Hidden Markov Model Variational Autoencoder (BHMMVAE) which solves these issues by embedding the HMMVAE in a Bayesian framework with a Dirichlet Process Prior for the distribution of the acoustic units and diagonal or full-covariance Gaussians as emission distributions. Experiments on Timit and Xitsonga show that the BHMMVAE is able to autonomously infer a reasonable number of acoustic units, can be initialized without supervision by a GMM-HMM system, achieves computationally efficient stochastic variational inference by using natural gradient descent and, additionally, improves the AUD performance over the HMMVAE.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

📈 Trend Setter — Autoencoders

🐣 Hot Topic Early Bird — variational autoencoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thomas Glarner , Patrick Hanebrink , Janek Ebbers , Reinhold Haeb-Umbach

Topics

Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Learning Types > Unsupervised Learning Deep Learning > Architectures > Autoencoders Deep Learning > Models > Variational Inference Speech & Audio > Recognition > Speech Recognition Machine Learning > Bayesian & Probabilistic > Bayesian Inference Speech & Audio > Analysis > Speech Analysis Machine Learning > Bayesian & Probabilistic > Variational Inference

Keywords

dirichlet process variational inference bayesian inference stochastic variational inference hidden markov model variational autoencoder acoustic unit discovery

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018