2025 COLING COLING 2025

How to age BERT Well: Continuous Training for Historical Language Adaptation

Abstract

AbstractAs the application of computational tools increases to digitalize historical archives, automatic annotation challenges persist due to distinct linguistic and morphological features of historical languages like Old English (OE). Existing tools struggle with the historical language varieties due to insufficient training. Previous research has focused on adapting pre-trained language models to new languages or domains but has rarely explored the modeling of language variety across time. Hence, we investigate the effectiveness of continuous language model training for adapting language models to OE on domain-specific data. We compare the continuous training of an English model (EN) and a multilingual model (ML), and use POS tagging for downstream evaluation. Results show that continuous pre-training substantially improves performance. We retrain a modern English (EN) model and a Multi-lingual (ML) BERT model for OE. We confirmed the effectiveness of continuous pre-training for language adaptation and downstream evaluation utilizing part-of-speech (POS) tagging, advancing the potential to understand the unique grammatical structures of historical OE archives. More concretely, EN BERT initially outperformed ML BERT with an accuracy of 83% during the language modeling phase. However, on the POS tagging task, ML BERT surpassed EN BERT, achieving an accuracy of 94%, which suggests effective performance to the historical language varieties.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — bert adaptation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio