2025 NAACL NAACL 2025

ParsiPy: NLP Toolkit for Historical Persian Texts in Python

Abstract

AbstractThe study of historical languages presents unique challenges due to their complex ortho-graphic systems, fragmentary textual evidence, and the absence of standardized digital repre-sentations of text in those languages. Tack-ling these challenges needs special NLP digi-tal tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy1, an NLP toolkit designed to facili-tate the analysis of historical Persian languages by offering modules for tokenization, lemma-tization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embed-ding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Per-sian) texts, highlighting its potential for ex-panding computational methods in the study of historical languages. Through this work, we contribute to the field of computational philol-ogy, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.

🌉 Interdisciplinary Bridge — Interdisciplinary and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio