Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps

Jeong-Uk Bang; Mu-Yeol Choi; Sang-Hun Kim; Oh-Wook Kwon

2017 INTERSPEECH INTERSPEECH 2017

Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps

Abstract

This paper proposes an automatic method to refine broadcast data collected every week for efficient acoustic model training. For training acoustic models, we use only audio signals, subtitle texts, and subtitle timestamps accompanied by recorded broadcast programs. However, the subtitle timestamps are often inaccurate due to inherent characteristics of closed captioning. In the proposed method, we remove subtitle texts with low subtitle quality index, concatenate adjacent subtitle texts into a merged subtitle text, and correct the timestamp of the merged subtitle text by adding a margin. Then, a speech recognizer is used to obtain a hypothesis text from the speech segment corresponding to the merged subtitle text. Finally, the refined speech segments to be used for acoustic model training, are generated by selecting the subparts of the merged subtitle text that matches the hypothesis text. It is shown that the acoustic models trained by using refined broadcast data give significantly higher speech recognition accuracy than those trained by using raw broadcast data. Consequently, the proposed method can efficiently refine a large amount of broadcast data with inaccurate timestamps taking about half of the time, compared with the previous approaches.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — subtitle timestamp

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jeong-Uk Bang , Mu-Yeol Choi , Sang-Hun Kim , Oh-Wook Kwon

Topics

Machine Learning > Application Areas > Data Augmentation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

speech recognition acoustic model training speech segment subtitle timestamp broadcast datum

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017