UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Chengyi Wang; Yu Wu; Yao Qian; Kenichi Kumatani; Shujie LIU; Furu Wei; Michael Zeng; Xuedong Huang

2021 ICML ICML 2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Abstract

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both labeled and unlabeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve the generalization across languages and domains. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus. The results show that UniSpeech outperforms self-supervised pretraining and supervised transfer learning for speech recognition by a maximum of 13.4% and 26.9% relative phone error rate reductions respectively (averaged over all testing languages). The transferability of UniSpeech is also verified on a domain-shift speech recognition task, i.e., a relative word error rate reduction of 6% against the previous approach.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — phonetic structure

🐣 Hot Topic Early Bird — cross-lingual transfer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Chengyi Wang , Yu Wu , Yao Qian , Kenichi Kumatani , Shujie LIU , Furu Wei , Michael Zeng , Xuedong Huang

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Multi-Task Learning

Keywords

multi-task learning self-supervised learning speech recognition cross-lingual transfer speech representation learning phonetic structure

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021