Learning attention for historical text normalization by learning to pronounce

Marcel Bollmann; Joachim Bingel; Anders Søgaard

2017 ACL ACL 2017

Learning attention for historical text normalization by learning to pronounce

Abstract

AbstractAutomated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — text normalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marcel Bollmann , Joachim Bingel , Anders Søgaard

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Paradigms > Multi-Task Learning Deep Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Natural Language Processing

Keywords

multi-task learning attention mechanism encoder-decoder architecture grapheme-to-phoneme conversion text normalization

Download PDF

Related papers

A* CCG Parsing with a Supertag and Dependency Factored Model 2017

Detecting annotation noise in automatically labelled data 2017

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2017

Annotating tense, mood and voice for English, French and German 2017

Word Embedding for Response-To-Text Assessment of Evidence 2017