PDF: Polyphone Disambiguation in Chinese by Using FLAT
Abstract
Polyphone disambiguation is an essential procedure in the front-end module of the Chinese text-to-speech (TTS) system. It serves to predict the pronunciation of the input polyphonic character. In the Chinese TTS system, a well-designed pronunciation dictionary plays a crucial role in supplying pinyin to words. However, the conventional system is unable to fully utilize the pronunciation dictionary while modelling because of the unavoidable Chinese segment errors and model structure. In this paper, we proposed a system named PDF: Polyphone Disambiguation by using FLAT. The proposed model encodes both the input character sequence and dictionary matched words of the sentence, enabling the model to both avoid segment errors and leverage the well-designed pronunciation dictionary in the model. Additionally, we also use the pre-trained language model (PLM) as an encoder to extract the contextual information of input sequence. The experimental results verified the effectiveness of the proposed PDF model. Our system obtains an improvement in accuracy by 0.98% compared to Bert on an open-source dataset. The experiential results demonstrate that leveraging pronunciation dictionary while modelling helps improve the performance of polyphone disambiguation system.