Difference between revisions of "Randomness, Structure and Causality - Agenda"
From Santa Fe Institute Events Wiki
Line 1: | Line 1: | ||
{{Randomness, Structure and Causality}} | {{Randomness, Structure and Causality}} | ||
− | We | + | |
+ | == Abstracts == | ||
+ | |||
+ | <br> | ||
+ | |||
+ | The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts<br> | ||
+ | |||
+ | Debowski, Lukasz (ldebowsk@ipipan.waw.pl<br> | ||
+ | Polish Academy of Sciences<br> | ||
+ | <br> | ||
+ | <p> | ||
+ | We will present a new explanation for the distribution of words in | ||
+ | natural language which is grounded in information theory and inspired | ||
+ | by recent research in excess entropy. Namely, we will demonstrate a | ||
+ | theorem with the following informal statement: If a text of length $n$ | ||
+ | describes $n^\beta$ independent facts in a repetitive way then the | ||
+ | text contains at least $n^\beta/\log n$ different words. In the | ||
+ | formal statement, two modeling postulates are adopted. Firstly, the | ||
+ | words are understood as nonterminal symbols of the shortest | ||
+ | grammar-based encoding of the text. Secondly, the text is assumed to | ||
+ | be emitted by a finite-energy strongly nonergodic source whereas the | ||
+ | facts are binary IID variables predictable in a shift-invariant | ||
+ | way. Besides the theorem, we will exhibit a few stochastic processes | ||
+ | to which this and similar statements can be related. | ||
+ | |||
+ | <p> | ||
+ | |||
+ | [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]] |
Revision as of 18:39, 16 December 2010
Workshop Navigation |
Abstracts
The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
Debowski, Lukasz (ldebowsk@ipipan.waw.pl
Polish Academy of Sciences
We will present a new explanation for the distribution of words in natural language which is grounded in information theory and inspired by recent research in excess entropy. Namely, we will demonstrate a theorem with the following informal statement: If a text of length $n$ describes $n^\beta$ independent facts in a repetitive way then the text contains at least $n^\beta/\log n$ different words. In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way. Besides the theorem, we will exhibit a few stochastic processes to which this and similar statements can be related.