2020
ACL
ACL 2020
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble
Abstract
AbstractThe task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing and Speech & Audio
🐣
Hot Topic Early Bird
— multilingual transformer
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Deep Learning > Architectures > Transformers
Natural Language Processing > Resources & Methods > Multilingual NLP
Speech & Audio > Recognition > Speech Recognition
Speech & Audio > Synthesis > Text-to-Speech
Machine Learning > Learning Types > Multi-Task Learning
Deep Learning > Learning Types > Ensemble Learning
Speech & Audio > Synthesis > Speech Synthesis