2018 ACL ACL 2018

Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder Models

Abstract

AbstractEncoder-decoder models typically only employ words that are frequently used in the training corpus because of the computational costs and/or to exclude noisy words. However, this vocabulary set may still include words that interfere with learning in encoder-decoder models. This paper proposes a method for selecting more suitable words for learning encoders by utilizing not only frequency, but also co-occurrence information, which we capture using the HITS algorithm. The proposed method is applied to two tasks: machine translation and grammatical error correction. For Japanese-to-English translation, this method achieved a BLEU score that was 0.56 points more than that of a baseline. It also outperformed the baseline method for English grammatical error correction, with an F-measure that was 1.48 points higher.

📈 Trend Setter — Text Representation
🧭 Keyword Pioneer — graph-based filtering
🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing