Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-Pooling Strategy and Window-Based Attention

Junjie Li; Zhiyu Zhang; Minchuan Chen; Jun Ma; Shaojun Wang; Jing Xiao

2021 INTERSPEECH INTERSPEECH 2021

Improving Polyphone Disambiguation for Mandarin Chinese by Combining Mix-Pooling Strategy and Window-Based Attention

Abstract

In this paper, we propose a novel system based on word-level features and window-based attention for polyphone disambiguation, which is a fundamental task for Grapheme-to-phoneme (G2P) conversion of Mandarin Chinese. The framework aims to combine a pre-trained language model with explicit word-level information in order to get meaningful context extraction. Particularly, we employ a pre-trained bidirectional encoder from Transformers (BERT) model to extract character-level features, and an external Chinese word segmentation (CWS) tool is used to obtain the word units. We adopt a mixed pooling mechanism to convert character-level features into word-level features based on the segmentation results. A window-based attention module is utilized to incorporate contextual word-level features for the polyphonic characters. Experimental results show that our method achieves an accuracy of 99.06% on an open benchmark dataset for Mandarin Chinese polyphone disambiguation, which outperforms the baseline systems.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Junjie Li , Zhiyu Zhang , Minchuan Chen , Jun Ma , Shaojun Wang , Jing Xiao

Topics

Interdisciplinary > Linguistics > Computational Linguistics

Keywords

attention mechanism word segmentation grapheme-to-phoneme conversion mandarin chinese polyphone disambiguation

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021