2016 COLING COLING 2016

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Abstract

AbstractNatural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model’s performance further.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing
📈 Trend Setter — Text Generation
🧭 Keyword Pioneer — character-level model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio