PDF: Polyphone Disambiguation in Chinese by Using FLAT

Haiteng Zhang

2021 INTERSPEECH INTERSPEECH 2021

PDF: Polyphone Disambiguation in Chinese by Using FLAT

Abstract

Polyphone disambiguation is an essential procedure in the front-end module of the Chinese text-to-speech (TTS) system. It serves to predict the pronunciation of the input polyphonic character. In the Chinese TTS system, a well-designed pronunciation dictionary plays a crucial role in supplying pinyin to words. However, the conventional system is unable to fully utilize the pronunciation dictionary while modelling because of the unavoidable Chinese segment errors and model structure. In this paper, we proposed a system named PDF: Polyphone Disambiguation by using FLAT. The proposed model encodes both the input character sequence and dictionary matched words of the sentence, enabling the model to both avoid segment errors and leverage the well-designed pronunciation dictionary in the model. Additionally, we also use the pre-trained language model (PLM) as an encoder to extract the contextual information of input sequence. The experimental results verified the effectiveness of the proposed PDF model. Our system obtains an improvement in accuracy by 0.98% compared to Bert on an open-source dataset. The experiential results demonstrate that leveraging pronunciation dictionary while modelling helps improve the performance of polyphone disambiguation system.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — chinese text-to-speech

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Haiteng Zhang

Topics

Machine Learning > Core Methods > Representation Learning Speech & Audio > Synthesis > Text-to-Speech

Keywords

pre-trained language model character encoding pronunciation dictionary polyphone disambiguation chinese text-to-speech

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021