2021 EMNLP EMNLP 2021

A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

Abstract

AbstractLexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Natural Language Processing
🧭 Keyword Pioneer — japanese text processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio