Leveraging Text Data for Word Segmentation for Underresourced Languages

Thomas Glarner; Benedikt Boenninghoff; Oliver Walter; Reinhold Haeb-Umbach

2017 INTERSPEECH INTERSPEECH 2017

Leveraging Text Data for Word Segmentation for Underresourced Languages

Abstract

In this contribution we show how to exploit text data to support word discovery from audio input in an underresourced target language. Given audio, of which a certain amount is transcribed at the word level, and additional unrelated text data, the approach is able to learn a probabilistic mapping from acoustic units to characters and utilize it to segment the audio data into words without the need of a pronunciation dictionary. This is achieved by three components: an unsupervised acoustic unit discovery system, a supervisedly trained acoustic unit-to-grapheme converter, and a word discovery system, which is initialized with a language model trained on the text data. Experiments for multiple setups show that the initialization of the language model with text data improves the word segmentation performance by a large margin.

🧭 Keyword Pioneer — word discovery

🐣 Hot Topic Early Bird — word segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thomas Glarner , Benedikt Boenninghoff , Oliver Walter , Reinhold Haeb-Umbach

Topics

Speech & Audio > Recognition > Speech Recognition

Keywords

word segmentation language model acoustic unit discovery word discovery underresourced language

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017