The Zero Resource Speech Challenge 2021: Spoken Language Modelling

Ewan Dunbar; Mathieu Bernard; Nicolas Hamilakis; Tu Anh Nguyen; Maureen de Seyssel; Patricia Rozé; Morgane Riviere; Eugene Kharitonov; Emmanuel Dupoux

2021 INTERSPEECH INTERSPEECH 2021

The Zero Resource Speech Challenge 2021: Spoken Language Modelling

Abstract

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer (k-means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — spoken language modeling

Authors

Ewan Dunbar , Mathieu Bernard , Nicolas Hamilakis , Tu Anh Nguyen , Maureen de Seyssel , Patricia Rozé , Morgane Riviere , Eugene Kharitonov , Emmanuel Dupoux

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Analysis > Speech Analysis Machine Learning > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Self-Supervised Learning

Keywords

speaker verification acoustic representation unsupervised representation learning language model acoustic feature spoken language modeling speech representation learning contrastive predictive coding zero-resource speech

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021