Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Siyuan Feng; Odette Scharenborg

2020 INTERSPEECH INTERSPEECH 2020

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Abstract

This study addresses unsupervised subword modeling, i.e., learning feature representations that can distinguish subword units of a language. The proposed approach adopts a two-stage bottleneck feature (BNF) learning framework, consisting of autoregressive predictive coding (APC) as a front-end and a DNN-BNF model as a back-end. APC pretrained features are set as input features to a DNN-BNF model. A language-mismatched ASR system is used to provide cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are extracted as the subword-discriminative feature representation. A second aim of this work is to investigate the robustness of our approach’s effectiveness to different amounts of training data. The results on Libri-light and the ZeroSpeech 2017 databases show that APC is effective in front-end feature pretraining. Our whole system outperforms the state of the art on both databases. Cross-lingual phone labels for English data by a Dutch ASR outperform those by a Mandarin ASR, possibly linked to the larger similarity of Dutch compared to Mandarin with English. Our system is less sensitive to training data amount when the training data is over 50 hours. APC pretraining leads to a reduction of needed training material from over 5,000 hours to around 200 hours with little performance degradation.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — unsupervised subword modeling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siyuan Feng , Odette Scharenborg

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Unsupervised Learning Speech & Audio > Recognition > Automatic Speech Recognition Machine Learning > Learning Types > Transfer Learning

Keywords

representation learning speech recognition unsupervised feature learning automatic speech recognition bottleneck feature subword modeling autoregressive predictive coding autoregressive pretraining unsupervised subword modeling cross-lingual modeling cross-lingual phone modeling

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020