2020
ACL
ACL 2020
A Three-Parameter Rank-Frequency Relation in Natural Languages
Abstract
AbstractWe present that, the rank-frequency relation in textual data follows f โ r-๐ผ(r+๐พ)-๐ฝ, where f is the token frequency and r is the rank by frequency, with (๐ผ, ๐ฝ, ๐พ) as parameters. The formulation is derived based on the empirical observation that d2 (x+y)/dx2 is a typical impulse function, where (x,y)=(log r, log f). The formulation is the power law when ๐ฝ=0 and the ZipfโMandelbrot law when ๐ผ=0. We illustrate that ๐ผ is related to the analytic features of syntax and ๐ฝ+๐พ to those of morphology in natural languages from an investigation of multilingual corpora.
๐
Interdisciplinary Bridge
โ Interdisciplinary and Machine Learning and Mathematics & Optimization and Natural Language Processing
๐งญ
Keyword Pioneer
โ rank-frequency relation
๐
Cross-Pollinator
โ Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing