KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Egor Lakomkin; Sven Magg; Cornelius Weber; Stefan Wermter

2018 EMNLP EMNLP 2018

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Abstract

AbstractWe describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — youtube video

🐣 Hot Topic Early Bird — dataset construction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Egor Lakomkin , Sven Magg , Cornelius Weber , Stefan Wermter

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Machine Learning > Application Areas > Data Augmentation Speech & Audio > Recognition > Speech Recognition

Keywords

data augmentation speech recognition speech corpus dataset construction youtube video

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018