NLP for Digital Humanities: Processing Chronological Text Corpora

Adam Pawłowski; Tomasz Walkowiak

2024 EMNLP EMNLP 2024

NLP for Digital Humanities: Processing Chronological Text Corpora

Abstract

AbstractThe paper focuses on the integration of Natural Language Processing (NLP) techniques to analyze extensive chronological text corpora. This research underscores the synergy between humanistic inquiry and computational methods, especially in the processing and analysis of sequential textual data known as lexical series. A reference workflow for chronological corpus analysis is introduced, outlining the methodologies applicable to the ChronoPress corpus, a data set that encompasses 22 years of Polish press from 1945 to 1966. The study showcases the potential of this approach in uncovering cultural and historical patterns through the analysis of lexical series. The findings highlight both the challenges and opportunities present in leveraging lexical series analysis within Digital Humanities, emphasizing the necessity for advanced data filtering and anomaly detection algorithms to effectively manage the vast and intricate datasets characteristic of this field.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — lexical series analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adam Pawłowski , Tomasz Walkowiak

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Applications > Text Classification Data Science & Analytics > Methods > Time Series Interdisciplinary > Digital Humanities Machine Learning > Core Methods > Anomaly Detection Natural Language Processing > Applications > Text Processing

Keywords

anomaly detection natural language processing text mining corpus analysis digital humanities lexical series analysis chronological corpus text corpora processing cultural pattern discovery lexical series temporal text mining cultural pattern historical pattern analysis

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024