lecture 5: asymptotic equipartition propertyyxie77/ece587/lecture5.pdf{n(h(x) ϵ) su ces to describe...

26
Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University

Upload: others

Post on 03-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Lecture 5: Asymptotic Equipartition Property

• Law of large number for product of random variables

• AEP and consequences

Dr. Yao Xie, ECE587, Information Theory, Duke University

Page 2: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Stock market

• Initial investment Y0, daily return ratio ri, in t-th day, your money is

Yt = Y0r1 · . . . · rt.

• Now if returns ratio ri are i.i.d., with

ri =

{4, w.p. 1/20, w.p. 1/2.

• So you think the expected return ratio is Eri = 2,

• and thenEYt = E(Y0r1 · . . . · rt) = Y0(Eri)

t = Y02t?

Dr. Yao Xie, ECE587, Information Theory, Duke University 1

Page 3: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Is “optimized” really optimal?

• With Y0 = 1, actual return Yt goes like

1 4 16 0 0 0 . . .

• Optimize expected return is not optimal?

• Fundamental reason: products does not behave the same as addition

Dr. Yao Xie, ECE587, Information Theory, Duke University 2

Page 4: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

(Weak) Law of large numberTheorem. For independent, identically distributed (i.i.d.) randomvariables Xi,

X̄n =1

n

n∑i=1

Xi → EX, in probability.

• Convergence in probability if for every ϵ > 0,

P{|Xn −X| > ϵ} → 0.

• Proof by Markov inequality.

• So this means

P{|X̄n − EX| ≤ ϵ} → 1, n → ∞.

Dr. Yao Xie, ECE587, Information Theory, Duke University 3

Page 5: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Other types of convergence

• In mean square if as n → ∞

E(Xn −X)2 → 0

• With probability 1 (almost surely) if as n → ∞

P{

limn→∞

Xn = X}= 1

• In distribution if as n → ∞

limn

Fn → F,

where Fn and F are the cumulative distribution function of Xn and X.

Dr. Yao Xie, ECE587, Information Theory, Duke University 4

Page 6: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Product of random variables

• How does this behave?

n

√√√√ n∏i=1

Xi

• Geometric mean n√∏n

i=1Xi ≤ arithmetic mean 1n

∑ni=1Xi

• Examples:

– Volume V of a random box, each dimension Xi, V = X1 · . . . ·Xn

– Stock return Yt = Y0r1 · . . . · rt– Joint distribution of i.i.d. RVs: p(x1, . . . , xn) =

∏ni=1 p(xi)

Dr. Yao Xie, ECE587, Information Theory, Duke University 5

Page 7: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Law of large number for product of random variables

• We can write

Xi = elogXi

• Hence

n

√√√√ n∏i=1

Xi = e1n

∑ni=1 logXi

• So from LLN

n

√√√√ n∏i=1

Xi → eE(logX) ≤ elogEX = EX.

Dr. Yao Xie, ECE587, Information Theory, Duke University 6

Page 8: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

• Stock example:

E log ri =1

2log 4 +

1

2log 0 = −∞

E(Yt) → Y0eE log ri = 0, t → ∞.

• Example

X =

{a, w.p. 1/2b, w.p. 1/2.

E

n

√√√√ n∏i=1

Xi

→√ab ≤ a+ b

2

Dr. Yao Xie, ECE587, Information Theory, Duke University 7

Page 9: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Asymptotic equipartition property (AEP)

• LLN states that1

n

n∑i=1

Xi → EX

• AEP states that most sequences

1

nlog

1

p(X1, X2, . . . , Xn)→ H(X)

p(X1, X2, . . . , Xn) ≈ 2−nH(X)

• Analyze using LLN for product of random variables

Dr. Yao Xie, ECE587, Information Theory, Duke University 8

Page 10: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

AEP lies in the heart of information theory.

• Proof for lossless source coding

• Proof for channel capacity

• and more...

Dr. Yao Xie, ECE587, Information Theory, Duke University 9

Page 11: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

AEP

Theorem. If X1, X2, . . . are i.i.d. ∼ p(x), then

−1

nlog p(X1, X2, . . . , Xn) → H(X), in probability.

Proof:

−1

nlog p(X1, X2, · · · , Xn) = −1

n

n∑i=1

log p(Xi)

→ −E log p(X)

= H(X).

There are several consequences.

Dr. Yao Xie, ECE587, Information Theory, Duke University 10

Page 12: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Typical set

A typical set

A(n)ϵ

contains all sequences (x1, x2, . . . , xn) ∈ Xn with the property

2−n(H(X)+ϵ) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−ϵ).

Dr. Yao Xie, ECE587, Information Theory, Duke University 11

Page 13: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Not all sequences are created equal

• Coin tossing example: X ∈ {0, 1}, p(1) = 0.8

p(1, 0, 1, 1, 0, 1) = p∑

Xi(1− p)5−∑

Xi = p4(1− p)2 = 0.0164

p(0, 0, 0, 0, 0, 0) = p∑

Xi(1− p)5−∑

Xi = p4(1− p)2 = 0.000064

• In this example, if(x1, . . . , xn) ∈ A(n)

ϵ ,

H(X)− ϵ ≤ −1

nlog p(X1, . . . , Xn) ≤ H(X) + ϵ.

• This means a binary sequence is in typical set is the frequency of headsis approximately k/n

Dr. Yao Xie, ECE587, Information Theory, Duke University 12

Page 14: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Dr. Yao Xie, ECE587, Information Theory, Duke University 13

Page 15: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

p = 0.6, n = 25, k = number of “1”s

k

(

n

k

) (

n

k

)

pk(1 − p)n−k−

1

nlog p(xn)

0 1 0.000000 1.321928

1 25 0.000000 1.298530

2 300 0.000000 1.275131

3 2300 0.000001 1.251733

4 12650 0.000007 1.228334

5 53130 0.000054 1.204936

6 177100 0.000227 1.181537

7 480700 0.001205 1.158139

8 1081575 0.003121 1.134740

9 2042975 0.013169 1.111342

10 3268760 0.021222 1.087943

11 4457400 0.077801 1.064545

12 5200300 0.075967 1.041146

13 5200300 0.267718 1.017748

14 4457400 0.146507 0.994349

15 3268760 0.575383 0.970951

16 2042975 0.151086 0.947552

17 1081575 0.846448 0.924154

18 480700 0.079986 0.900755

19 177100 0.970638 0.877357

20 53130 0.019891 0.853958

21 12650 0.997633 0.830560

22 2300 0.001937 0.807161

23 300 0.999950 0.783763

24 25 0.000047 0.760364

25 1 0.000003 0.736966

Dr. Yao Xie, ECE587, Information Theory, Duke University 14

Page 16: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Consequences of AEP

Theorem.1. If (x1, x2, . . . , xn) ∈ A(n)ϵ , then for n sufficiently large:

H(X)− ϵ ≤ −1

nlog p(x1, x2, . . . , xn) ≤ H(X) + ϵ

2. P{A(n)ϵ } ≥ 1− ϵ.

3. |A(n)ϵ | ≤ 2n(H(X)+ϵ).

4. |A(n)ϵ | ≥ (1− ϵ)2n(H(X)−ϵ).

Dr. Yao Xie, ECE587, Information Theory, Duke University 15

Page 17: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Property 1

If (x1, x2, . . . , xn) ∈ A(n)ϵ , then

H(X)− ϵ ≤ −1

nlog p(x1, x2, . . . , xn) ≤ H(X) + ϵ.

• Proof from definition:

(x1, x2, . . . , xn) ∈ A(n)ϵ ,

if2−n(H(X)+ϵ) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−ϵ).

• The number of bits used to describe sequences in typical set isapproximately nH(X).

Dr. Yao Xie, ECE587, Information Theory, Duke University 16

Page 18: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Property 2

P{A(n)ϵ } ≥ 1− ϵ for n sufficiently large.

• Proof: From AEP: because

−1

nlog p(X1, . . . , Xn) → H(X)

in probability, this means for a given ϵ > 0, when n is sufficiently large

p{∣∣∣∣−1

nlog p(X1, . . . , Xn)−H(X)

∣∣∣∣ ≤ ϵ︸ ︷︷ ︸∈A

(n)ϵ

} ≥ 1− ϵ.

– High probability: sequences in typical set are “most typical”.– These sequences almost all have same probability - “equipartition”.

Dr. Yao Xie, ECE587, Information Theory, Duke University 17

Page 19: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Property 3 and 4: size of typical set

(1− ϵ)2n(H(X)−ϵ) ≤ |A(n)ϵ | ≤ 2n(H(X)+ϵ)

• Proof:

1 =∑

(x1,...,xn)

p(x1, . . . , xn)

≥∑

(x1,...,xn)∈A(n)ϵ

p(x1, . . . , xn)

≥∑

(x1,...,xn)∈A(n)ϵ

p(x1, . . . , xn)2−n(H(X)+ϵ)

= |A(n)ϵ |2−n(H(X)+ϵ).

Dr. Yao Xie, ECE587, Information Theory, Duke University 18

Page 20: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

On the other hand, P{A(n)ϵ } ≥ 1− ϵ for n, so

1− ϵ <∑

(x1,...,xn)∈A(n)ϵ

p(x1, . . . , xn)

≤ |A(n)ϵ |2−n(H(X)−ϵ).

• Size of typical set depends on H(X).

• When p = 1/2 in coin tossing example, H(X) = 1, 2nH(X) = 2n: allsequences are typical sequences.

Dr. Yao Xie, ECE587, Information Theory, Duke University 19

Page 21: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Typical set diagram

• This enables us to divide all sequences into two sets

– Typical set: high probability to occur, sample entropy is close to trueentropyso we will focus on analyzing sequences in typical set

– Non-typical set: small probability, can ignore in general

Non-typical set

Typical set

A(n) : 2n(H + ) elements

n:| |n elements

Dr. Yao Xie, ECE587, Information Theory, Duke University 20

Page 22: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Data compression scheme from AEP

• Let X1, X2, . . . , Xn be i.i.d. RV drawn from p(x)

• We wish to find short descriptions for such sequences of RVs

Dr. Yao Xie, ECE587, Information Theory, Duke University 21

Page 23: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

• Divide all sequences in Xn into two sets

Non-typical set

Typical set

Description: n log | | + 2 bits

Description: n(H + ) + 2 bits

Dr. Yao Xie, ECE587, Information Theory, Duke University 22

Page 24: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

• Use one bit to indicate which set

– Typical set A(n)ϵ use prefix “1”

Since there are no more than 2n(H(X)+ϵ) sequences, indexing requiresno more than ⌈(H(X) + ϵ)⌉+ 1 (plus one extra bit)

– Non-typical set use prefix “0”Since there are at most |X |n sequences, indexing requires no morethan ⌈n log |X |⌉+ 1

• Notation: xn = (x1, . . . , xn), l(xn) = length of codeword for xn

• We can prove

E

[1

nl(Xn)

]≤ H(X) + ϵ

Dr. Yao Xie, ECE587, Information Theory, Duke University 23

Page 25: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Summary of AEP

Almost everything is almost equally probable.

• Reasons that AEP has H(X)

– −1n log p(xn) → H(X), in probability

– n(H(X)± ϵ) suffices to describe that random sequence on average– 2H(X) is the effective alphabet size– Typical set is the smallest set with probability near 1– Size of typical set 2nH(X)

– The distance of elements in the set nearly uniform

Dr. Yao Xie, ECE587, Information Theory, Duke University 24

Page 26: Lecture 5: Asymptotic Equipartition Propertyyxie77/ece587/Lecture5.pdf{n(H(X) ϵ) su ces to describe that random sequence on average { 2H(X) is the e ective alphabet size { Typical

Next Time

• AEP is about the property of independent sequences

• What about the dependence processes?

• Answer is entropy rate - next time.

Dr. Yao Xie, ECE587, Information Theory, Duke University 25