2024 COLING COLING 2024

A Large Annotated Reference Corpus of New High German Poetry

Abstract

AbstractThis paper introduces a large annotated corpus of public domain German poetry, covering the time period from 1600 to the 1920s with 65k poems. We describe how the corpus was compiled, how it was cleaned (including duplicate detection), and how it looks now in terms of size, format, temporal distribution, and automatic annotation. Besides metadata, the corpus contains reliable annotation of tokens, syllables, part-of-speech, and meter and verse measure. Finally, we give some statistics on the annotation and an overview of other poetry corpora.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Security & Privacy, Speech & Audio

Authors