2024 EACL EACL 2024

Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works

Abstract

AbstractThis study extends previous research on literary quality by using information theory-based methods to assess the level of perplexity recorded by three large language models when processing 20th-century English novels deemed to have high literary quality, recognized by experts as canonical, compared to a broader control group. We find that canonical texts appear to elicit a higher perplexity in the models, we explore which textual features might concur to create such an effect. We find that the usage of a more heavily nominal style, together with a more diverse vocabulary, is one of the leading causes of the difference between the two groups. These traits could reflect “strategies” to achieve an informationally dense literary style.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Natural Language Processing
🧭 Keyword Pioneer — perplexity measurement
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio