Multi-task learning for historical text normalization: Size matters

Marcel Bollmann; Anders Søgaard; Joachim Bingel

2018 ACL ACL 2018

Multi-task learning for historical text normalization: Size matters

Abstract

AbstractHistorical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multi-task learning can be used to leverage data from related problems in order to obtain more robust models. Previous work has been limited to datasets from a specific language and a specific historical period, and it is not clear whether results generalize. It therefore remains an open problem, when historical text normalization benefits from multi-task learning. We explore the benefits of multi-task learning across 10 different datasets, representing different languages and periods. Our main finding—contrary to what has been observed for other NLP tasks—is that multi-task learning mainly works when target task data is very scarce.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — data scarcity

🐣 Hot Topic Early Bird — low-resource learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marcel Bollmann , Anders Søgaard , Joachim Bingel

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Domain Adaptation Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Multi-Task Learning Natural Language Processing > Applications > Text Processing

Keywords

multi-task learning low-resource learning low-resource language data scarcity text normalization historical text dataset size historical text normalization

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018