Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels

Itsumi Saito; Jun Suzuki; Kyosuke Nishida; Kugatsu Sadamitsu; Satoshi Kobashikawa; Ryo Masumura; Yuji Matsumoto; Junji Tomita

2017 IJCNLP IJCNLP 2017

Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels

Abstract

AbstractIn this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Data Augmentation

🧭 Keyword Pioneer — neural text normalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Itsumi Saito , Jun Suzuki , Kyosuke Nishida , Kugatsu Sadamitsu , Satoshi Kobashikawa , Ryo Masumura , Yuji Matsumoto , Junji Tomita

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Generation > Text Generation Natural Language Processing > Resources & Methods > Text Representation Natural Language Processing > Applications > Text Generation Deep Learning > Learning Types > Deep Learning Machine Learning > Learning Types > Data Augmentation

Keywords

data augmentation encoder-decoder model character-level processing text normalization neural text normalization neural network character level morphological level

Download PDF

Related papers

Procedural Text Generation from an Execution Video 2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset 2017

Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior 2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts 2017

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task 2017