in search of the shortest descriptionniwinski/mowy/chimene.pdf · in search of the shortest...
TRANSCRIPT
In search of the shortest description
Damian Niwinski
Faculty of Mathematics, Informatics, and Mechanics
University of Warsaw
Philosophers’ Rally, Gdansk, June 2012
1
Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus
courte.
I have made this [letter] longer, because I have not had the time to make it shorter.
Blaise Pascal, Lettres provinciales, 1657
2
Shortening is art
SCENE PREMIERE – CHIMENE, ELVIRE
CHIMENE
Elvire, m’as tu fait un rapport bien sincere ?
Ne deguises-tu rien de ce qu’a dit mon pere ?
In translation by Stanisław Wyspianski:
SZIMENA
Wiec mowił. . . ?After Tadeusz Boy Zelenski
3
Shortening is mathematics
7, 625, 597, 484, 987
333
0110100110010110100101100110100110010110011010010110100110010110. . .
0→ 011→ 10
0 1
0 1 1 0
0 1 1 0 1 0 0 1
0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0
. . . . . . . . . . . . . . . .
3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 . . .
program generating π
5
To find a short description, we need to understand an idea behind an object.
• • •[©
Photo MITO SettembreMusica
6
But is it always possible to shorten a number ?
Not always !
Lemma. If, for each n ∈ N, short(n) is a word over r symbols and
short(n) 6= short(n′) whenever n 6= n′, then for infinitely many
n’s, we have
short(n) ≥ blogr nc.
In this case short is not better than the standard positional notation.
7
This comes from simple calculation.
If rk objects are described by r symbols then at least one of them has a
description of length at least k.
This is because the number of words shorter than k is
1 + r + r2 + . . .+ rk−1 =rk − 1r − 1
< rk.
© 2 3 4 ∇ ♣ ♥ ♠
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
ε 0 1 00 01 10 11 000
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
10 0 ε 11 101 00 011 01
8
Andrey N. Kolmogorov introduced (in 1960s) the following concept:
number n 7→x := 10
For i = 1..10 do
x := x ? x
Return x
the shortest program generating n
(in a fixed programming language, e.g., Pascal)
This program can be viewed as an ideal shortening of n .
Remark. The length of the program depends on the choice of a programming
language, but only up to additive constant.
9
By the Lemma,
n 7→While min(a, b) > 0 do
if b > a then b := b mod a
else a := a mod b
Return max(a, b)
is not always a shortening.
For some n’s, the best what we can do is
9346296618342597 7→ Write (9346296618342597)
i.e., the program is as long as the number itself.
Kolmogorov called such numbers random.
11
More precisely, the celebrated Kolmogorov complexity of a
number n a is:
K(n) = the length of the shortest binary program
generating n.
A number n is random whenever
K(n) ≥ blog2 nc.
There are infinitely many random numbers.
12
Random numbers exist in Nature, but can neither be computed nor
recognized.
K
• n
0
OO
//
mn
Let mn be the least such that n ≤ K (mn). Then mn is random !
(Indeed, some i ∈ 0, 1, . . . , 2n − 1 must satisfy blog2 ic ≤ n ≤ K(i), whereas
mn ≤ i implies blog2 mnc ≤ blog2 ic ≤ n ≤ K (mn).)
13
K
• n
0
OO
//mn
But no program P (x) can compute the function n 7→ mn, for all n.
Otherwise, for given n, the program P (n) computes mn and has the length
log2 n+ const < n
A contradiction !
Note a reminiscence of Berry’s paradox:
Let m1000 be
the least natural number, which
cannot be described in English
with less than 1000 symbols.
.
14
Consequently, the function n 7→ K(n) cannot be computed, as otherwise we
could use it
x := 0
While K(x) < n do x := x + 1
Return mn = x
Neither there is a computable way to verify if a given number m is random:
The set of random numbers is not computable.
Otherwise
Input n
count := i := 0
While count < n do
333 if Random(i) then count := count + 1 fi
333 i := i + 1
Return i
computes the n th random number. Take, e.g., 22n
to obtain a contradiction.
15
We can only prove non-randomness, e.g.,
3 14159265358979323846264338327950288419716939937510
3.58209749445923078164062862089986280348253421170679
3.82148086513282306647093844609550582231725359408128
3.48111745028410270193852110555964462294895493038196
3.98336733624406566430860213949463952247371907021798
3.60943702770539217176293176752384674818467669405132
3.00056812714526356082778577134275778960917363717872
is not random.
16
In about the same time (1960s), Gregory Chaitin (born 1947), then a high school
student in New York, discovered independently the main concepts of the
Kolmogorov complexity.
He found an alternative proof of the
Godel Incompleteness Theorem.
If a theory T is a consistent extension of the Peano arithmetics PA then T is
incomplete.
Proof. The property n is not random is semi-representable in PA.
But it is not possible that, for each n ∈ N,
T ` n is random or T ` n is not random,
as otherwise the set of random numbers would computable.
17
Claude Shannon led the foundations of information theory in his
landmark paper of 1948
A Mathematical Theory of Communication.
In the subsequent paper of 1951
Prediction and Entropy of Printed English,
he related his theory to computational linguistics.
20
Examples from Shannon’s original paper and Lucky’s book.
Quoted from T.M.Cover, J.A.Thomas, Elements of Information Theory .
The symbols are independent and equiprobable.
XFOML RXKHRJFFJUJ ZLPWCFWKCYJ
FFJEYVKCQSGYD QPAAMKBZAACIBZLHJQD
The symbols are independent. Frequency of letters matches English text.
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI
ALHENHTTPA OOBTTVA NAH BRL
The frequency of pairs of letters matches English text.
ON IE ANTISOUTINYS ARE T INCTORE ST B S DEAMY
ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO
TIZIN ANDY TOBE SEACE CTISBE
21
The frequency of triplets of letters matches English text.
IN NO IST LAT WHEY CRATICT FROURE BERS GROCID
PONDENOME OF DEMONSTURES OH THE REPTAGIN IS
REOGACTIONA OF CRE
The frequency of quadruples of letters matches English text.
Each letter depends previous three letters.
THE GENERATED JOB PROVIDUAL BETTER TRAND THE
DISPLAYED CODE, ABOVERY UPONDULTS WELL THE
CODERST IN THESTICAL IT DO HOCK BOTHE MERG.
(INSTATES CONS ERATION. NEVER ANY OF PUBLE AND TO
THEORY. EVENTIAL CALLEGAND TO ELAST BENERATED IN
WITH PIES AS IS WITH THE)
22
Example of 20 question game
Danish ?Y
vvllllllllllN
&&LLLLLLLLLL
married ?Y
zzvvvvvvvvvN
((QQQQQQQQQQ
Kierkegaard ?Y
vvmmmmmmmmmmmmN
%%LLLLLLLLLL
76540123. .^
23
Question game
Answerer chooses a challenge from some known set of objects.
The challenge is kept secrete.
Questioner asks questions: does the challenge belong to the set of objects S ?
Answerer answers: yes or no.
Questioner wishes to guess the challenge ASP.
Danish ?Y
uujjjjjj N
''OOOOOO
married ?Y
yyttttt N
))SSSSSS
Kierkegaard ?Y
uukkkkkkkk N
''NNNNNN
/.-,()*+. .^
o ∈ S1 ?Y
wwooooo N
%%KKKKK
o ∈ S2 ?Y
yysssss N
''PPPPP. . .
. . . o ∈ S3 ?Y
vvmmmmm N
&&NNNN. . . . . .
24
Default strategy
1, 2, 3, 4?
ttiiiiiiiiiii
**UUUUUUUUUUU
1, 2?
yytttttt%%JJJJJJ 6, 7?
yytttttt%%JJJJJJ
1?
4444 3?
4444 5?
4444 7?
4444
1 2 3 4 5 6 7 8
Number of questions is log2 |Objects| .
25
Knowing the probability distribution, we can do better.
Pr (sleeps) = 12 , Pr (rests) = 1
4 , Pr (eats) = Pr (works) = 18 .
sleeps ?Y
wwwwwww N
&&LLLLLLL
'&%$ !"#s rests ?Y
xxqqqqqqqqq N
%%KKKKKKK
'&%$ !"#r eats ?Y
yyrrrrrrrr N
##GGGGGGG
'&%$ !"#e '&%$ !"#wThe expected number of questions:
1 · 12
+ 2 · 14
+ 3 ·(
18
+18
)=
74< 2 = log2 4.
26
Knowing the probability distribution, we can do better.
p (sleeps) = 12 , p (rests) = 1
4 , p (eats) = p (works) = 18 .
sleeps ?Y
vvvvvv N
&&MMMMMM
'&%$ !"#s rests ?Y
xxqqqqqqq N
%%LLLLL
'&%$ !"#r eats ?Y
xxrrrrrrr N
##HHHHH
'&%$ !"#e '&%$ !"#w
The expected number of questions:
1 · 12
+ 2 · 14
+ 3 ·„
1
8+
1
8
«=
7
4< 2 = log2 4.
The number of questions to guess an object with probability q is log21q .
27
Shannon’s entropy
In general, if p is a probability distribution on the set of objects S,
the Shannon entropy is
H(S) =∑s∈S
p(s) · log2
1
p(s).
It can be viewed as the average number of questions to be asked in
an optimal strategy in the question game.
The number of questions to identify an object s is log21
p(s).
28
p (sleeps) = 12 , p (rests) = 1
4 , p (eats) = p (works) = 18 .
sleeps ?Y
zzuuuu N
&&MMMs rests ?Y
xxppppp N
&&LLLr eats ?Y
xxqqqq N
$$IIIe '&%$ !"#w
In the example above H(S) =
= p(s) log 1p(s) + p(r) log 1
p(r) + p(e) log 1p(e) + p(w) log 1
p(w)
= 12 1 + 1
4 2 + 18 3 + 1
8 3
= 74
29
H(S) =P
s∈S p(s) · log21
p(s) .
The number of questions to identify an object s is log21
p(s) .
The definition of entropy conforms to the Weber-Fechner law of cognitive science:
the human perception (P ) of the growth of a physical stimuli (S), is proportional
to the relative growth of the stimuli rather than to its absolute growth,
∂P ≈ ∂S
S
consequently, after integration,
P ≈ logS.
This has been observed in perception of weight, brightness, sound (both intensity
and height), and even one’s economic status.
30
Intuitively, the Shannon entropy tells us how difficult is it to find an
object.
The Kolmogorov complexity tells us, how difficult is it to create it.
Is there a link between them ?
32
Chaitin’s constant
Ω =∑P↓
2−|P |
where P ranges over binary programs, and P ↓ means that P halts.
Ω can be interpreted as probability that by tossing a coin, we find a binary
program, which moreover halts.
01101011011001101010101010101101001101010︸ ︷︷ ︸this is a program which halts 76540123. .
^
. . .
33
0 Ω 1
Every prefix of Ω is Kolmogorov random.
Knowledge of Ω would suffice to reconstruct any computation.
34
For n ∈ N, let P ↓ n means that P halts and produces n.
Let
p(n) =
∑P↓n 2−|P |
ΩThis can be interpreted as probability that by tossing a coin, we find a binary
program, which produces n.
Theorem.
K(n) ≈ log2
1p(n)
More precisely, there exists a constant c, such that, for all n,
K(n)− c ≤ log2
1p(n)
≤ K(n) + c
35
K(n) ≈ log21
p(n)
Kolmogorov Shannon
x := 10
For i = 1..10 do
x := x ? x
Return x
Danish ?Y
vvnnnnnnn N
%%KKKKKKK
married ?Y
vvvvvv N
((PPPPPPP
Kierkegaard ?Y
wwnnnnnnnn N
%%KKKKKKK
'&%$ !"#. .^
36
Researchers who have contributed to the theory: C.E.Shannon, A.N.Kolmogorov,
R.Solomonoff, G.Chaitin, P.Martin–Lof, L.A.Levin, P.Gacs, and many others.
Literature.
Thomas M. Cover, and Joy A. Thomas,
Elements of Information Theory , John Wiley & Sons, 1991.
Ming Li and Paul Vitanyi,
An Introduction to Kolmogorov Complexity and Its Applications, Springer, 2008.
D.Niwinski, M.Strojnowski, M.Wojnarski, Teoria informacji ,
http://wazniak.mimuw.edu.pl/
37