XTREME-S: Evaluating Cross-lingual Speech Representations

Alexis CONNEAU; Ankur Bapna; Yu Zhang; Min Ma; Patrick von Platen; Anton Lozhkov; Colin Cherry; Ye Jia; Clara Rivera; Mihir Kale; Daan van Esch; Vera Axelrod; Simran Khanuja; Jonathan Clark; Orhan Firat; Michael Auli; Sebastian Ruder; Jason Riesa; Melvin Johnson

2022 INTERSPEECH INTERSPEECH 2022

XTREME-S: Evaluating Cross-lingual Speech Representations

Abstract

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible through the HuggingFace platform (https://hf.co/datasets/google/xtreme_s).

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — multilingual speech

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexis CONNEAU , Ankur Bapna , Yu Zhang , Min Ma , Patrick von Platen , Anton Lozhkov , Colin Cherry , Ye Jia , Clara Rivera , Mihir Kale , Daan van Esch , Vera Axelrod , Simran Khanuja , Jonathan Clark , Orhan Firat , Michael Auli , Sebastian Ruder , Jason Riesa , Melvin Johnson

Topics

Machine Learning > Core Methods > Representation Learning Speech & Audio > Recognition > Speech Recognition

Keywords

representation learning benchmark evaluation speech recognition multilingual speech speech-to-text translation cross-lingual speech

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022