An Alignment-Agnostic Model for Chinese Text Error Correction

Liying Zheng; Yue Deng; Weishun Song; Liang Xu; Jing Xiao

2021 EMNLP EMNLP 2021

An Alignment-Agnostic Model for Chinese Text Error Correction

Abstract

AbstractThis paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters, but cannot handle missing or redundant characters due to inconsistency between model inputs and outputs. Although Seq2Seq-based or sequence tagging methods provide solutions to the three error types and achieved relatively good results in English context, they do not perform well in Chinese context according to our experiments. In our work, we propose a novel alignment-agnostic detect-correct framework that can handle both text aligned and non-aligned situations and can serve as a cold start model when no annotation data are provided. Experimental results on three datasets demonstrate that our method is effective and achieves a better performance than most recent published models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — text error correction

🐣 Hot Topic Early Bird — error detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Liying Zheng , Yue Deng , Weishun Song , Liang Xu , Jing Xiao

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Artificial Intelligence > Core AI > Natural Language Processing Natural Language Processing > Applications > Text Processing

Keywords

sequence labeling sequence tagging error detection seq2seq model chinese text processing text error correction cold start model

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021