ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Jiatong Shi; Shih-Heng Wang; William Chen; Martijn Bartelds; Vanya Bannihatti Kumar; Jinchuan Tian; Xuankai Chang; Dan Jurafsky; Karen Livescu; Hung-yi Lee; Shinji Watanabe

2024 INTERSPEECH INTERSPEECH 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Abstract

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB 2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — multilingual speech model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Jiatong Shi , Shih-Heng Wang , William Chen , Martijn Bartelds , Vanya Bannihatti Kumar , Jinchuan Tian , Xuankai Chang , Dan Jurafsky , Karen Livescu , Hung-yi Lee , Shinji Watanabe

Topics

Machine Learning > Learning Types > Self-Supervised Learning Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

self-supervised learning automatic speech recognition model adaptation model fine-tuning multilingual speech model speech model benchmarking

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024