2020 ACL ACL 2020

A Three-Parameter Rank-Frequency Relation in Natural Languages

Abstract

AbstractWe present that, the rank-frequency relation in textual data follows f โˆ r-๐›ผ(r+๐›พ)-๐›ฝ, where f is the token frequency and r is the rank by frequency, with (๐›ผ, ๐›ฝ, ๐›พ) as parameters. The formulation is derived based on the empirical observation that d2 (x+y)/dx2 is a typical impulse function, where (x,y)=(log r, log f). The formulation is the power law when ๐›ฝ=0 and the Zipfโ€“Mandelbrot law when ๐›ผ=0. We illustrate that ๐›ผ is related to the analytic features of syntax and ๐›ฝ+๐›พ to those of morphology in natural languages from an investigation of multilingual corpora.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Interdisciplinary and Machine Learning and Mathematics & Optimization and Natural Language Processing
๐Ÿงญ Keyword Pioneer โ€” rank-frequency relation
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing