2019 INTERSPEECH INTERSPEECH 2019

Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition

Abstract

In this paper, we propose a method to extend a phone set by using a large amount of Korean broadcast data to improve the performance of spontaneous speech recognition. The proposed method first extracts variable-length phoneme-level segments from broadcast data, and then converts them into fixed-length latent vectors based on an LSTM architecture. Then, we used the k-means algorithm to cluster acoustically similar latent vectors and then build a new phone set by gathering the clustered vectors. To update the lexicon of a speech recognizer, we choose the pronunciation sequence of each word with the highest conditional probability. To verify the performance of the proposed unit, we visualize the spectral patterns and segment duration for the new phone set. In both spontaneous and read speech recognition tasks, the proposed unit is shown to produce better performance than the phoneme-based and grapheme-based units.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — spontaneous speech recognition
🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio