WSpeller: Robust Word Segmentation for Enhancing Chinese Spelling Check

Fangfang Li; Youran Shan; Junwen Duan; Xingliang Mao; Minlie Huang

2022 EMNLP EMNLP 2022

WSpeller: Robust Word Segmentation for Enhancing Chinese Spelling Check

Abstract

AbstractChinese spelling check (CSC) detects and corrects spelling errors in Chinese texts. Previous approaches have combined character-level phonetic and graphic information, ignoring the importance of segment-level information. According to our pilot study, spelling errors are always associated with incorrect word segmentation. When appropriate word boundaries are provided, CSC performance is greatly enhanced. Based on these findings, we present WSpeller, a CSC model that takes into account word segmentation. A fundamental component of WSpeller is a W-MLM, which is trained by predicting visually and phonetically similar words. Through modification of the embedding layer’s input, word segmentation information can be incorporated. Additionally, a robust module is trained to assist the W-MLM-based correction module by predicting the correct word segmentations from sentences containing spelling errors. We evaluate WSpeller on the widely used benchmark datasets SIGHAN13, SIGHAN14, and SIGHAN15. Our model is superior to state-of-the-art baselines on SIGHAN13 and SIGHAN15 and maintains equal performance on SIGHAN14.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — graphic similarity

🐣 Hot Topic Early Bird — error correction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fangfang Li , Youran Shan , Junwen Duan , Xingliang Mao , Minlie Huang

Topics

Machine Learning > Core Methods > Classification Natural Language Processing Natural Language Processing > Understanding > Parsing Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Natural Language Processing Natural Language Processing > Applications > Text Processing

Keywords

Download PDF

Generative Entity Typing with Curriculum Learning 2022

Towards Reinterpreting Neural Topic Models via Composite Activations 2022

Weakly Supervised Headline Dependency Parsing 2022

Cross-modal Transfer Between Vision and Language for Protest Detection 2022

WSpeller: Robust Word Segmentation for Enhancing Chinese Spelling Check

Abstract

Authors

Topics

Keywords

Related papers