G2TT
来源类型Article
规范类型其他
DOI10.1098/rsif.2015.0330
Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation.
Thurner S; Hanel R; Liu B; Corominas-Murtra B
发表日期2015
出处Interface 12 (108): 0330
出版年2015
语种英语
摘要The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the 'history' of word usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces a long sentence formation, on average. We first show that the model explains the approximate Zipf law found in word frequencies as a direct consequence of sample-space reduction. We then empirically quantify the amount of sample-space reduction in the sentences of 10 famous English books, by analysis of corresponding word-transition tables that capture which words can follow any given word in a text. We find a highly nested structure in these transition tables and show that this 'nestedness' is tightly related to the power law exponents of the observed word frequency distributions. With the proposed model, it is possible to understand that the nestedness of a text can be the origin of the actual scaling exponent and that deviations from the exact Zipf law can be understood by variations of the degree of nestedness on a book-by-book basis. On a theoretical level, we are able to show that in the case of weak nesting, Zip's law breaks down in a fast transition. Unlike previous attempts to understand Zipf's law in language the sample-space reducing model is not based on assumptions of multiplicative, preferential or self-organized critical mechanisms behind language formation, but simply uses the empirically quantifiable parameter 'nestedness' to understand the statstics of word frequencies.
主题Advanced Systems Analysis (ASA)
关键词language formation random walks on networks scaling in stochastic processes word-transition networs
URLhttp://pure.iiasa.ac.at/id/eprint/11401/
来源智库International Institute for Applied Systems Analysis (Austria)
引用统计
资源类型智库出版物
条目标识符http://119.78.100.153/handle/2XGU8XDN/130330
推荐引用方式
GB/T 7714
Thurner S,Hanel R,Liu B,et al. Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation.. 2015.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Thurner S]的文章
[Hanel R]的文章
[Liu B]的文章
百度学术
百度学术中相似的文章
[Thurner S]的文章
[Hanel R]的文章
[Liu B]的文章
必应学术
必应学术中相似的文章
[Thurner S]的文章
[Hanel R]的文章
[Liu B]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。