Golos: Russian Dataset for Speech Research

Nikolay Karpov; Alexander Denisenko; Fedor Minkin

2021 INTERSPEECH INTERSPEECH 2021

Golos: Russian Dataset for Speech Research

Abstract

This paper introduces a novel Russian speech dataset called Golos, a large corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available to download, along with the acoustic model with CTC loss prepared on this corpus. Additionally, transfer learning was applied to improve the performance of the acoustic model. In order to evaluate the quality of the dataset with the beam-search algorithm, we have built a 3-gram language model on the open Common Crawl dataset. The total word error rate (WER) metrics turned out to be about 3.3% and 11.5%.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

📈 Trend Setter — Medical Imaging

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Nikolay Karpov , Alexander Denisenko , Fedor Minkin

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Computer Vision > Domain-Specific > Medical Imaging Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

transfer learning speech recognition acoustic model language model speech dataset russian language ctc loss

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021