positive and negative randomness paul vitanyi cwi, university of amsterdam joint work with kolya...

20
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Upload: kory-gilmore

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Positive and Negative Randomness

Paul Vitanyi CWI, University of Amsterdam

Joint work with Kolya Vereshchagin

Page 2: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Non-Probabilistic Statistics

Page 3: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Classic Statistics--Recalled

Page 4: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Probabilistic Sufficient Statistic

Page 5: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Kolmogorov complexity

K(x)= length of shortest description of x K(x|y)=length of shortest description of x

given y.

A string is random if K(x) ≥ |x|.

K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

Page 6: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Randomness Deficiency

Page 7: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Algorithmic Sufficient Statistic where model is a set

Page 8: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Algorithmic suficient statistic where model is a total computable function

Data is binary string x;Model is a total computable function p ;Prefix complexity is K(p) (size smallest TM computing p);Data-to-model code length l_x(p)=min_d {|d|:p(d)=x.

x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small.p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p.

p is minimal ss (sophistication) for x if K(p) minimal.

Page 9: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Graph Structure Function

h_x(α)

α

log |S| Lower boundh_x(α)=K(x)-α

Page 10: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Minimum Description Length estimator, Relations between estimators

Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}.

MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}.

Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

Page 11: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Individual characteristics: More detail, especially for meaningful (nonrandom) Data

We flip the graph so that log|.|is on the x-axis and K(.) is on the y-axis. This is essentally theRate-distortion graph for list (set)distortion.

Page 12: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Primogeniture of ML/MDL estimators

•ML/MDL estimators can be approximatedfrom above;•Best-fit estimator cannot be approximatedEither from above or below, up to anyPrecision.•But the approximable ML/MDL estimatorsyield the best-fitting models, even thoughwe don’t know the quantity of goodness-of-fit ML/MDL estimators implicitlyoptimize goodness-of-fit.

Page 13: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Positive- and Negative Randomness,

and Probabilistic Models

Page 14: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Precision of following given function h(α)

h(α)

d

h_x(α)

Model cost α

Data-to-Model cost log |S|

Page 15: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Logarithmic precision is sharp

Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those arethe strings of high complexity K(x) > n.

For strings of low complexity, say K(x)< n/2,The number of appropriate functions is muchgreater than the number of strings. Hence there cannot be a string for every such function. Butwe show that there is a string for every approximate shape of function.

Page 16: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

All degrees of neg. randomness

Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n(up to a log term)

Proof. All shapes of the structure function are possible, as long as it starts from n-k anddecreases monotonicallyand is 0 at k for some k ≤ n.(Up to the precision in the previous slide).

Page 17: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Are there natural examples of negative randomness

Question: Are there natural examples of strings ofwith large negative randomness. Kolmogorov didn’tThink they exist, but we know the are abundant..

Maybe information distance between strings xand y yields large negative randomness.

Page 18: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Information Distance:

• Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98)

D(x,y) = min { |p|: p(x)=y & p(y)=x}

Binary program for a Universal Computer(Lisp, Java, C, Universal Turing Machine)

Theorem (i) D(x,y) = max {K(x|y),K(y|x)}

Kolmogorov complexity of x given y, definedas length of shortest binary ptogram thatoutputs x on input y.

(ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x.

≤ 1

(iii) D(x,y) is a metric.

Page 19: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Not between random strings

• The information distance between random strings x and y of length n doesn’t work.

• If x,y satisfy K(x|y),K(y|x) > n then

p=x XOR y where XOR means bitwise

exclusive-or serves as a program to

translate x too y and y to x. But if x and y

are positively random it appears that p

is so too.

T

Page 20: Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Selected BibliographyN.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.