in search of the shortest descriptionniwinski/mowy/chimene.pdf · in search of the shortest...

In search of the shortest description

Damian Niwinski

Faculty of Mathematics, Informatics, and Mechanics

University of Warsaw

Philosophers’ Rally, Gdansk, June 2012

1

Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus

courte.

I have made this [letter] longer, because I have not had the time to make it shorter.

Blaise Pascal, Lettres provinciales, 1657

2

Shortening is art

SCENE PREMIERE – CHIMENE, ELVIRE

CHIMENE

Elvire, m’as tu fait un rapport bien sincere ?

Ne deguises-tu rien de ce qu’a dit mon pere ?

In translation by Stanisław Wyspianski:

SZIMENA

Wiec mowił. . . ?After Tadeusz Boy Zelenski

3

Shortening is technology

Biblioteca Joanina, Coimbra, Portugal Photo NotFromUtrecht

4

Shortening is mathematics

7, 625, 597, 484, 987

333

0110100110010110100101100110100110010110011010010110100110010110. . .

0→ 011→ 10

0 1

0 1 1 0

0 1 1 0 1 0 0 1

0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0

. . . . . . . . . . . . . . . .

3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 . . .

program generating π

5

To find a short description, we need to understand an idea behind an object.

• • •[©

Photo MITO SettembreMusica

6

But is it always possible to shorten a number ?

Not always !

Lemma. If, for each n ∈ N, short(n) is a word over r symbols and

short(n) 6= short(n′) whenever n 6= n′, then for infinitely many

n’s, we have

short(n) ≥ blogr nc.

In this case short is not better than the standard positional notation.

7

This comes from simple calculation.

If rk objects are described by r symbols then at least one of them has a

description of length at least k.

This is because the number of words shorter than k is

1 + r + r2 + . . .+ rk−1 =rk − 1r − 1

< rk.

© 2 3 4 ∇ ♣ ♥ ♠

↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓

ε 0 1 00 01 10 11 000

↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓

10 0 ε 11 101 00 011 01

8

Andrey N. Kolmogorov introduced (in 1960s) the following concept:

number n 7→x := 10

For i = 1..10 do

x := x ? x

Return x

the shortest program generating n

(in a fixed programming language, e.g., Pascal)

This program can be viewed as an ideal shortening of n .

Remark. The length of the program depends on the choice of a programming

language, but only up to additive constant.

9

31415926535897932

38462643383279502

88419716939937510

7→ • • •[©

10

By the Lemma,

n 7→While min(a, b) > 0 do

if b > a then b := b mod a

else a := a mod b

Return max(a, b)

is not always a shortening.

For some n’s, the best what we can do is

9346296618342597 7→ Write (9346296618342597)

i.e., the program is as long as the number itself.

Kolmogorov called such numbers random.

11

More precisely, the celebrated Kolmogorov complexity of a

number n a is:

K(n) = the length of the shortest binary program

generating n.

A number n is random whenever

K(n) ≥ blog2 nc.

There are infinitely many random numbers.

12

Random numbers exist in Nature, but can neither be computed nor

recognized.

K

• n

0

OO

//

mn

Let mn be the least such that n ≤ K (mn). Then mn is random !

(Indeed, some i ∈ 0, 1, . . . , 2n − 1 must satisfy blog2 ic ≤ n ≤ K(i), whereas

mn ≤ i implies blog2 mnc ≤ blog2 ic ≤ n ≤ K (mn).)

13

K

• n

0

OO

//mn

But no program P (x) can compute the function n 7→ mn, for all n.

Otherwise, for given n, the program P (n) computes mn and has the length

log2 n+ const < n

A contradiction !

Note a reminiscence of Berry’s paradox:

Let m1000 be

the least natural number, which

cannot be described in English

with less than 1000 symbols.

.

14

Consequently, the function n 7→ K(n) cannot be computed, as otherwise we

could use it

x := 0

While K(x) < n do x := x + 1

Return mn = x

Neither there is a computable way to verify if a given number m is random:

The set of random numbers is not computable.

Otherwise

Input n

count := i := 0

While count < n do

333 if Random(i) then count := count + 1 fi

333 i := i + 1

Return i

computes the n th random number. Take, e.g., 22n

to obtain a contradiction.

15

We can only prove non-randomness, e.g.,

3 14159265358979323846264338327950288419716939937510

3.58209749445923078164062862089986280348253421170679

3.82148086513282306647093844609550582231725359408128

3.48111745028410270193852110555964462294895493038196

3.98336733624406566430860213949463952247371907021798

3.60943702770539217176293176752384674818467669405132

3.00056812714526356082778577134275778960917363717872

is not random.

16

In about the same time (1960s), Gregory Chaitin (born 1947), then a high school

student in New York, discovered independently the main concepts of the

Kolmogorov complexity.

He found an alternative proof of the

Godel Incompleteness Theorem.

If a theory T is a consistent extension of the Peano arithmetics PA then T is

incomplete.

Proof. The property n is not random is semi-representable in PA.

But it is not possible that, for each n ∈ N,

T ` n is random or T ` n is not random,

as otherwise the set of random numbers would computable.

17

Kolmogorov

hard rare

easy frequent

0

OO

//

Shannon

18

Kolmogorov

hard rare

easy frequent

0

OO

//

Shannon

19

Claude Shannon led the foundations of information theory in his

landmark paper of 1948

A Mathematical Theory of Communication.

In the subsequent paper of 1951

Prediction and Entropy of Printed English,

he related his theory to computational linguistics.

20

Examples from Shannon’s original paper and Lucky’s book.

Quoted from T.M.Cover, J.A.Thomas, Elements of Information Theory .

The symbols are independent and equiprobable.

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ

FFJEYVKCQSGYD QPAAMKBZAACIBZLHJQD

The symbols are independent. Frequency of letters matches English text.

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI

ALHENHTTPA OOBTTVA NAH BRL

The frequency of pairs of letters matches English text.

ON IE ANTISOUTINYS ARE T INCTORE ST B S DEAMY

ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO

TIZIN ANDY TOBE SEACE CTISBE

21

The frequency of triplets of letters matches English text.

IN NO IST LAT WHEY CRATICT FROURE BERS GROCID

PONDENOME OF DEMONSTURES OH THE REPTAGIN IS

REOGACTIONA OF CRE

The frequency of quadruples of letters matches English text.

Each letter depends previous three letters.

THE GENERATED JOB PROVIDUAL BETTER TRAND THE

DISPLAYED CODE, ABOVERY UPONDULTS WELL THE

CODERST IN THESTICAL IT DO HOCK BOTHE MERG.

(INSTATES CONS ERATION. NEVER ANY OF PUBLE AND TO

THEORY. EVENTIAL CALLEGAND TO ELAST BENERATED IN

WITH PIES AS IS WITH THE)

22

Example of 20 question game

Danish ?Y

vvllllllllllN

&&LLLLLLLLLL

married ?Y

zzvvvvvvvvvN

((QQQQQQQQQQ

Kierkegaard ?Y

vvmmmmmmmmmmmmN

%%LLLLLLLLLL

76540123. .^

23

Question game

Answerer chooses a challenge from some known set of objects.

The challenge is kept secrete.

Questioner asks questions: does the challenge belong to the set of objects S ?

Answerer answers: yes or no.

Questioner wishes to guess the challenge ASP.

Danish ?Y

uujjjjjj N

''OOOOOO

married ?Y

yyttttt N

))SSSSSS

Kierkegaard ?Y

uukkkkkkkk N

''NNNNNN

/.-,()*+. .^

o ∈ S1 ?Y

wwooooo N

%%KKKKK

o ∈ S2 ?Y

yysssss N

''PPPPP. . .

. . . o ∈ S3 ?Y

vvmmmmm N

&&NNNN. . . . . .

24

Default strategy

1, 2, 3, 4?

ttiiiiiiiiiii

**UUUUUUUUUUU

1, 2?

yytttttt%%JJJJJJ 6, 7?

yytttttt%%JJJJJJ

1?

4444 3?

4444 5?

4444 7?

4444

1 2 3 4 5 6 7 8

Number of questions is log2 |Objects| .

25

Knowing the probability distribution, we can do better.

Pr (sleeps) = 12 , Pr (rests) = 1

4 , Pr (eats) = Pr (works) = 18 .

sleeps ?Y

wwwwwww N

&&LLLLLLL

'&%$ !"#s rests ?Y

xxqqqqqqqqq N

%%KKKKKKK

'&%$ !"#r eats ?Y

yyrrrrrrrr N

##GGGGGGG

'&%$ !"#e '&%$ !"#wThe expected number of questions:

1 · 12

+ 2 · 14

+ 3 ·(

18

+18

)=

74< 2 = log2 4.

26

Knowing the probability distribution, we can do better.

p (sleeps) = 12 , p (rests) = 1

4 , p (eats) = p (works) = 18 .

sleeps ?Y

vvvvvv N

&&MMMMMM

'&%$ !"#s rests ?Y

xxqqqqqqq N

%%LLLLL

'&%$ !"#r eats ?Y

xxrrrrrrr N

##HHHHH

'&%$ !"#e '&%$ !"#w

The expected number of questions:

1 · 12

+ 2 · 14

+ 3 ·„

1

8+

1

8

«=

7

4< 2 = log2 4.

The number of questions to guess an object with probability q is log21q .

27

Shannon’s entropy

In general, if p is a probability distribution on the set of objects S,

the Shannon entropy is

H(S) =∑s∈S

p(s) · log2

1

p(s).

It can be viewed as the average number of questions to be asked in

an optimal strategy in the question game.

The number of questions to identify an object s is log21

p(s).

28

p (sleeps) = 12 , p (rests) = 1

4 , p (eats) = p (works) = 18 .

sleeps ?Y

zzuuuu N

&&MMMs rests ?Y

xxppppp N

&&LLLr eats ?Y

xxqqqq N

$$IIIe '&%$ !"#w

In the example above H(S) =

= p(s) log 1p(s) + p(r) log 1

p(r) + p(e) log 1p(e) + p(w) log 1

p(w)

= 12 1 + 1

4 2 + 18 3 + 1

8 3

= 74

29

H(S) =P

s∈S p(s) · log21

p(s) .

The number of questions to identify an object s is log21

p(s) .

The definition of entropy conforms to the Weber-Fechner law of cognitive science:

the human perception (P ) of the growth of a physical stimuli (S), is proportional

to the relative growth of the stimuli rather than to its absolute growth,

∂P ≈ ∂S

S

consequently, after integration,

P ≈ logS.

This has been observed in perception of weight, brightness, sound (both intensity

and height), and even one’s economic status.

30

Good !

Good !

Good !

Good !

Good !

31

Intuitively, the Shannon entropy tells us how difficult is it to find an

object.

The Kolmogorov complexity tells us, how difficult is it to create it.

Is there a link between them ?

32

Chaitin’s constant

Ω =∑P↓

2−|P |

where P ranges over binary programs, and P ↓ means that P halts.

Ω can be interpreted as probability that by tossing a coin, we find a binary

program, which moreover halts.

01101011011001101010101010101101001101010︸︷︷︸this is a program which halts 76540123. .

^

. . .

33

0 Ω 1

Every prefix of Ω is Kolmogorov random.

Knowledge of Ω would suffice to reconstruct any computation.

34

For n ∈ N, let P ↓ n means that P halts and produces n.

Let

p(n) =

∑P↓n 2−|P |

ΩThis can be interpreted as probability that by tossing a coin, we find a binary

program, which produces n.

Theorem.

K(n) ≈ log2

1p(n)

More precisely, there exists a constant c, such that, for all n,

K(n)− c ≤ log2

1p(n)

≤ K(n) + c

35

K(n) ≈ log21

p(n)

Kolmogorov Shannon

x := 10

For i = 1..10 do

x := x ? x

Return x

Danish ?Y

vvnnnnnnn N

%%KKKKKKK

married ?Y

vvvvvv N

((PPPPPPP

Kierkegaard ?Y

wwnnnnnnnn N

%%KKKKKKK

'&%$ !"#. .^

36

Researchers who have contributed to the theory: C.E.Shannon, A.N.Kolmogorov,

R.Solomonoff, G.Chaitin, P.Martin–Lof, L.A.Levin, P.Gacs, and many others.

Literature.

Thomas M. Cover, and Joy A. Thomas,

Elements of Information Theory , John Wiley & Sons, 1991.

Ming Li and Paul Vitanyi,

An Introduction to Kolmogorov Complexity and Its Applications, Springer, 2008.

D.Niwinski, M.Strojnowski, M.Wojnarski, Teoria informacji ,

http://wazniak.mimuw.edu.pl/

37

in search of the shortest descriptionniwinski/mowy/chimene.pdf · in search of the shortest...

Documents