All Together Now: The Living Audio Dataset

David A. Braude; Matthew P. Aylett; Caoimhín Laoide-Kemp; Simone Ashby; Kristen M. Scott; Brian Ó Raghallaigh; Anna Braudo; Alex Brouwer; Adriana Stan

2019 INTERSPEECH INTERSPEECH 2019

All Together Now: The Living Audio Dataset

Abstract

The ongoing focus in speech technology research on machine learning based approaches leaves the community hungry for data. However, datasets tend to be recorded once and then released, sometimes behind registration requirements or paywalls. In this paper we describe our Living Audio Dataset. The aim is to provide audio data that is in the public domain, multilingual, and expandable by communities. We discuss the role of linguistic resources, given the success of systems such as Tacotron which use direct text-to-speech mappings, and consider how data provenance could be built into such resources. So far the data has been collected for TTS purposes, however, it is also suitable for ASR. At the time of publication audio resources already exist for Dutch, R.P. English, Irish, and Russian.

🧭 Keyword Pioneer — public domain

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

David A. Braude , Matthew P. Aylett , Caoimhín Laoide-Kemp , Simone Ashby , Kristen M. Scott , Brian Ó Raghallaigh , Anna Braudo , Alex Brouwer , Adriana Stan

Topics

Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Synthesis > Text-to-Speech

Keywords

automatic speech recognition multilingual dataset public domain

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019