2024
EACL
EACL 2024
Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works
Abstract
AbstractThis study extends previous research on literary quality by using information theory-based methods to assess the level of perplexity recorded by three large language models when processing 20th-century English novels deemed to have high literary quality, recognized by experts as canonical, compared to a broader control group. We find that canonical texts appear to elicit a higher perplexity in the models, we explore which textual features might concur to create such an effect. We find that the usage of a more heavily nominal style, together with a more diverse vocabulary, is one of the leading causes of the difference between the two groups. These traits could reflect “strategies” to achieve an informationally dense literary style.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Interdisciplinary and Natural Language Processing
🧭
Keyword Pioneer
— perplexity measurement
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Natural Language Processing > Generation > Language Modeling
Natural Language Processing > Resources & Methods > Large Language Models
Interdisciplinary > Linguistics > Computational Linguistics
Artificial Intelligence > Core AI > Large Language Models
Natural Language Processing > Resources & Methods > Language Modeling