ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Jiatong Shi; Dan Berrebbi; William Chen; En-Pei Hu; Wei-Ping Huang; Ho-Lam Chung; Xuankai Chang; Shang-Wen Li; Abdelrahman Mohamed; Hung-yi Lee; Shinji Watanabe

2023 INTERSPEECH INTERSPEECH 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Abstract

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — multilingual benchmark

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — multilingual speech

Authors

Jiatong Shi , Dan Berrebbi , William Chen , En-Pei Hu , Wei-Ping Huang , Ho-Lam Chung , Xuankai Chang , Shang-Wen Li , Abdelrahman Mohamed , Hung-yi Lee , Shinji Watanabe

Topics

Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition Deep Learning > Learning Types > Self-Supervised Learning

Keywords

self-supervised learning automatic speech recognition language identification multilingual speech speech representation multilingual benchmark frozen feature speech benchmark

Download PDF

Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer 2023

Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation 2023

What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Abstract

Authors

Topics

Keywords

Related papers