VCTUBE : A Library for Automatic Speech Data Annotation

Seong Choi; Seunghoon Jeong; Jeewoo Yoon; Migyeong Yang; Minsam Ko; Eunil Park; Jinyoung Han; Munyoung Lee; Seonghee Lee

2020 INTERSPEECH INTERSPEECH 2020

VCTUBE : A Library for Automatic Speech Data Annotation

Abstract

We introduce an open-source Python library, VCTUBE, which can automatically generate <audio, text> pair of speech data from a given Youtube URL. We believe VCTUBE is useful for collecting, processing, and annotating speech data easily toward developing speech synthesis systems.

🧭 Keyword Pioneer — speech data annotation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

🌉 Interdisciplinary Bridge — Computer Science and Speech & Audio

🐣 Hot Topic Early Bird — data annotation

Authors

Seong Choi , Seunghoon Jeong , Jeewoo Yoon , Migyeong Yang , Minsam Ko , Eunil Park , Jinyoung Han , Munyoung Lee , Seonghee Lee

Topics

Speech & Audio > Recognition > Speech Recognition Speech & Audio > Synthesis > Text-to-Speech Computer Science > Applications > Software Engineering

Keywords

data annotation automatic speech recognition text-to-speech synthesis python library speech datum speech data annotation youtube audio extraction

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020