## 数学代写|信息论代写Information Theory代考|ALGORITHMICALLY RANDOM AND INCOMPRESSIBLE SEQUENCES

From the examples in Section 14.2, it is clear that there are some long sequences that are simple to describe, like the first million bits of $\pi$. By the same token, there are also large integers that are simple to describe, such as
$$2^{2^{2^{2^{2^2}}}}$$
or $(100 !) !$.
We now show that although there are some simple sequences, most sequences do not have simple descriptions. Similarly, most integers are not simple. Hence, if we draw a sequence at random, we are likely to draw a complex sequence. The next theorem shows that the probability that a sequence can be compressed by more than $k$ bits is no greater than $2^{-k}$.

Theorem 14.5.1 Let $X_1, X_2, \ldots, X_n$ be drawn according to a Bernoulli $\left(\frac{1}{2}\right)$ process. Then
$$P\left(K\left(X_1 X_2 \ldots X_n \mid n\right)<n-k\right)<2^{-k} .$$

Proof:
\begin{aligned} P(K & \left.\left(X_1 X_2 \ldots X_n \mid n\right)<n-k\right) \ & =\sum_{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k} p\left(x_1, x_2, \ldots, x_n\right) \ & =\sum_{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k} 2^{-n} \ & =\left|\left{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k\right}\right| 2^{-n} \ & <2^{n-k} 2^{-n} \quad(\text { by Theorem 14.2.4) } \ & =2^{-k} . \end{aligned}

## 数学代写|信息论代写Information Theory代考|UNIVERSAL PROBABILITY

We now consider the tree-structured version of Lempel-Ziv, where the input sequence is parsed into phrases, each phrase being the shortest string that has not been seen so far. The proof of the optimality of this algorithm has a very different flavor from the proof for LZ77; the essence of the proof is a counting argument that shows that the number of phrases cannot be too large if they are all distinct, and the probability of any sequence of symbols can be bounded by a function of the number of distinct phrases in the parsing of the sequence.

The algorithm described in Section 13.4.2 requires two passes over the string – in the first pass, we parse the string and calculate $c(n)$, the number of phrases in the parsed string. We then use that to decide how many bits $[\log c(n)]$ to allot to the pointers in the algorithm. In the second pass, we calculate the pointers and produce the coded string as indicated above. The algorithm can be modified so that it requires only one pass over the string and also uses fewer bits for the initial pointers. These modifications do not affect the asymptotic efficiency of the algorithm. Some of the implementation details are discussed by Welch [554] and Bell et al. [41].
We will show that like the sliding window version of Lempel-Ziv, this algorithm asymptotically achieves the entropy rate for the unknown ergodic source. We first define a parsing of the string to be a decomposition into phrases.

Suppose that a computer is fed a random program. Imagine a monkey sitting at a keyboard and typing the keys at random. Equivalently, feed a series of fair coin flips into a universal Turing machine. In either case, most strings will not make sense to the computer. If a person sits at a terminal and types keys at random, he will probably get an error message (i.e., the computer will print the null string and halts). But with a certain probability she will hit on something that makes sense. The computer will then print out something meaningful. Will this output sequence look random?
From our earlier discussions, it is clear that most sequences of length $n$ have complexity close to $n$. Since the probability of an input program $p$ is $2^{-l(p)}$, shorter programs are much more probable than longer ones; and when they produce long strings, shorter programs do not produce random strings; they produce strings with simply described structure.

The probability distribution on the output strings is far from uniform. Under the computer-induced distribution, simple strings are more likely

## 数学代写|信息论代写Information Theory代考|ALGORITHMICALLY RANDOM AND INCOMPRESSIBLE SEQUENCES

$$2^{2^{2^{2^{2^2}}}}$$

$$P\left(K\left(X_1 X_2 \ldots X_n \mid n\right)<n-k\right)<2^{-k} .$$

\begin{aligned} P(K & \left.\left(X_1 X_2 \ldots X_n \mid n\right)<n-k\right) \ & =\sum_{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k} p\left(x_1, x_2, \ldots, x_n\right) \ & =\sum_{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k} 2^{-n} \ & =\left|\left{x_1 x_2 \ldots x_n: K\left(x_1 x_2 \ldots x_n \mid n\right)<n-k\right}\right| 2^{-n} \ & <2^{n-k} 2^{-n} \quad(\text { by Theorem 14.2.4) } \ & =2^{-k} . \end{aligned}

