Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

George Dahl; Marc'aurelio Ranzato; Abdel-rahman Mohamed; Geoffrey E. Hinton

2010 NIPS NeurIPS 2010

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Abstract

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonal-covariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), first introduced for modeling natural images, is a much more representationally efficient and powerful way of modeling the covariance structure of speech data. Every configuration of the precision units of the mcRBM specifies a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5\%, which is superior to all published results on speaker-independent TIMIT to date.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — phone recognition

📈 Trend Setter — Speech Recognition

🐣 Hot Topic Early Bird — speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

George Dahl , Marc'aurelio Ranzato , Abdel-rahman Mohamed , Geoffrey E. Hinton

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speech Recognition Deep Learning > Learning Types > Deep Learning

Keywords

speech recognition speech processing acoustic modeling phone recognition restricted boltzmann machine deep belief network

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010