Revisiting Entropy Rate Constancy in Text

Vivek Verma; Nicholas Tomlin; Dan Klein

2023 EMNLP EMNLP 2023

Revisiting Entropy Rate Constancy in Text

Abstract

AbstractThe uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel and Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel and Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Machine Learning and Mathematics & Optimization and Natural Language Processing

🧭 Keyword Pioneer — information density hypothesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vivek Verma , Nicholas Tomlin , Dan Klein

Topics

Mathematics & Optimization > Mathematics > Information Theory Interdisciplinary > Linguistics Interdisciplinary > Linguistics > Computational Linguistics Machine Learning > Optimization & Theory > Information Theory Natural Language Processing > Resources & Methods > Language Modeling Deep Learning > Models > Language Models

Keywords

information theory uniform information density neural language model n-gram language model entropy rate constancy information density hypothesis efficient communication

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023