Artificial Relationships in Fiction: A Dataset for Advancing NLP in Literary Domains
Abstract
AbstractRelation extraction (RE) in fiction presents unique NLP challenges due to implicit, narrative-driven relationships. Unlike factual texts, fiction weaves complex connections, yet existing RE datasets focus on non-fiction. To address this, we introduce Artificial Relationships in Fiction (ARF), a synthetically annotated dataset for literary RE. Built from diverse Project Gutenberg fiction, ARF considers author demographics, publication periods, and themes. We curated an ontology for fiction-specific entities and relations, and using GPT-4o, generated artificial relationships to capture narrative complexity. Our analysis demonstrates its value for finetuning RE models and advancing computational literary studies. By bridging a critical RE gap, ARF enables deeper exploration of fictional relationships, enriching NLP research at the intersection of storytelling and AI-driven literary analysis.