Download - Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School
![Page 1: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/1.jpg)
Information Theory
Ying Nian WuUCLA Department of Statistics
July 9, 2007IPAM Summer School
![Page 2: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/2.jpg)
Goal: A gentle introduction to the basic concepts in information theory
Emphasis: understanding and interpretations of these concepts
Reference: Elements of Information Theory by Cover and Thomas
![Page 3: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/3.jpg)
Topics•Entropy and relative entropy•Asymptotic equipartition property•Data compression•Large deviation•Kolmogorov complexity•Entropy rate of process
![Page 4: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/4.jpg)
EntropyRandomness or uncertainty of a probability distribution
kk paX )Pr( )Pr()( xXxp
8/18/14/12/1Pr
dcba
Example
Kk
Kk
ppppp
aaaa
21
21
![Page 5: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/5.jpg)
Definition
x
kk
k xpxpppXH )(log)(log)(
Entropy
kk paX )Pr( )Pr()( xXxp
Kk
Kk
ppppp
aaaa
21
21
![Page 6: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/6.jpg)
Entropy
8/18/14/12/1Pr
dcba
Example
4
7
8
1log
8
1
8
1log
8
1
4
1log
4
1
2
1log
2
1)( XH
x
kk
k xpxpppXH )(log)(log)(
![Page 7: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/7.jpg)
Definition for both discrete and continuous
)](log[)( XppH E
x
xpxfXf )()()]([ERecall
kk paX )Pr( )Pr()( xXxp
Entropy
x
kk
k xpxpppXH )(log)(log)(
Kk
Kk
ppppp
aaaa
21
21
![Page 8: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/8.jpg)
Example
Entropy
3321log
8/18/14/12/1Pr
p
dcba
4
73
8
13
8
12
4
11
2
1)( XH
)](log[)( XppH E
![Page 9: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/9.jpg)
Interpretation 1: cardinality
Uniform distribution ][~ AX Unif
There are elements inAll these choices are equally likely
A
||log|)]|/1log([)](log[)( AAXpXH EE
Entropy can be interpreted as log of volume or size dimensional cube has vertices can also be interpreted as dimensionality
What if the distribution is not uniform?
n n2
|| A
![Page 10: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/10.jpg)
Asymptotic equipartition property
Any distribution is essentially a uniform distribution in long run repetition
)(~,...,, 21 xpXXX n independently
)()...()(),...,,( 2121 nn XpXpXpXXXp
||/1)( AXp Recall if then a constant]Unif[AX ~
Random?
But in some sense, it is essentially a constant!
![Page 11: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/11.jpg)
Law of large number
)(~,...,, 21 xpXXX n independently
x
n
ii xpxfXfXf
n)()()]([)(
1
1
E
Long run average converges to expectation
![Page 12: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/12.jpg)
Asymptotic equipartition property
)()...()(),...,,( 2121 nn XpXpXpXXXp
)()]([log)(log1
),...,(log1
11 pHXpXp
nXXp
n
n
iin
E
Intuitively, in the long run,)(
1 2),...,( pnHnXXp
![Page 13: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/13.jpg)
)(1 2),...,( pnH
nXXp
||/1)( AXp Recall if
then a constant]Unif[ AX ~
Therefore, as if ]Unif[AXX n ~),...,( 1
)(2|| pnHA ,with
So the dimensionality per observation is )( pH
Asymptotic equipartition property
We can make it more rigorous
![Page 14: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/14.jpg)
)(~,...,, 21 xpXXX n independently
n
iiXf
n 1
1)|)(1
Pr(|
0nn for
x
n
ii xpxfXfXf
n)()()]([)(
1
1
E
Weak law of large number
![Page 15: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/15.jpg)
Typical set
)(1 2),...,( pnH
nXXp
]Unif[AXX n ~),...,( 1)(2|| pnHA ,with
Typical set nnA ,
Kk
Kk
ppppp
aaaa
21
21
n
,nA
![Page 16: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/16.jpg)
: The set of sequencesn
nxxx ),...,,( 21,nA))((
21))(( 2),...,,(2 pHn
npHn xxxp
1)( ,nAp
for sufficiently large
n
))((,
))(( 2||2)1(
pHnn
pHn A
Typical set)(
1 2),...,( pnHnXXp
]Unif[AXX n ~),...,( 1)(2|| pnHA ,with
n
,nA
![Page 17: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/17.jpg)
Interpretation 2: coin flipping
Flip a fair coin {Head, Tail}Flip a fair coin twice independently {HH, HT, TH, TT}……Flip a fair coin times independently equally likely sequences
We may interpret entropy as the number of flips
nn2
![Page 18: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/18.jpg)
Example
TTTHHTHH
dcba
flips
4/14/14/14/1Pr
The above uniform distribution amounts to 2 coin flips
Interpretation 2: coin flipping
![Page 19: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/19.jpg)
)(1 2),...,( pnH
nXXp
]Unif[AXX n ~),...,( 1)(2|| pnHA ,with
),...,( 1 nXX amounts to )( pnH flips
)( pH flips)(~ xpX amounts to
Kk
Kk
ppppp
aaaa
21
21
Interpretation 2: coin flipping
n
,nA
![Page 20: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/20.jpg)
THTTHHTTH
dcba
flips
8/18/14/12/1Pr
H T
H
H
T
T
a
b
c d
Interpretation 2: coin flipping
![Page 21: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/21.jpg)
3321log
8/18/14/12/1Pr
p
dcba
flips4
73
8
13
8
12
4
11
2
1)( XH
THTTHHTTH
dcba
flips
8/18/14/12/1Pr
Interpretation 2: coin flipping
![Page 22: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/22.jpg)
Interpretation 3: coding
Example
00011011
4/14/14/14/1Pr
code
dcbaA
)4/1log(4log2 length
![Page 23: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/23.jpg)
)(1 2),...,( pnH
nXXp
]Unif[AXX n ~),...,( 1)(2|| pnHA ,with
How many bits to code elements in A
Kk
Kk
ppppp
aaaa
21
21
?
)( pnH
Can be made more formal using typical set
bits
Interpretation 3: coding
n
,nA
![Page 24: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/24.jpg)
010011001
8/18/14/12/1Pr
code
dcba
100101100010abacbd
Prefix code
bitsE4
7
8
13
8
13
4
12
2
11)()()]([
x
xpxlXl
![Page 25: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/25.jpg)
H T
H
H
T
T
a
b
c d
100101100010abacbd
010011001
8/18/14/12/1Pr
code
dcba
Sequence of coin flippingA completely random sequenceCannot be further compressed
)(log)( xpxl
)()]([ pHXl E
Optimal code
e.g., two words I, probability
![Page 26: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/26.jpg)
Optimal code
Kk
Kk
Kk
lllll
ppppp
aaaa
21
21
21
Kraft inequality for prefix code
12
k
lk
k
kk plXlL )]([EMinimize
Optimal length kk pl log*
)(* pHL
![Page 27: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/27.jpg)
Wrong model
Kk
Kk
Kk
qqqq
pppp
aaaa
21
21
21
Wrong
True
kk
kkk
k ppplXlL log)]([ *** EOptimal code
k k
kkkk qpplXlL log)]([EWrong code
Redundancyk
k
kk q
ppLL log*
Box: All models are wrong, but some are useful
![Page 28: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/28.jpg)
Kk
Kk
Kk
qqqqq
ppppp
aaaa
21
21
21
Relative entropy
x
p xq
xpxp
Xq
XpqpD
)(
)(log)(]
)(
)([log)|( E
Kullback-Leibler divergence
![Page 29: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/29.jpg)
Kk
Kk
Kk
qqqqq
ppppp
aaaa
21
21
21
Relative entropy
0]))(
)([log(]
)(
)([log]
)(
)([log)|(
Xp
Xq
Xp
Xq
Xq
XpqpD ppp EEE
Jensen inequality
)(||log||log)]([log)|( pHXpUpD p E
![Page 30: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/30.jpg)
Types
Kk
Kk
Kk
nnnn
qqqqq
aaaa
21
21
21
freq
)(~,...,, 21 xqXXX n independently
kn number of times ki aX
n
nf k
k normalized frequency
![Page 31: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/31.jpg)
),...,(),...,( 11 KK qqqfff
Kk
Kk
Kk
nnnn
qqqqq
aaaa
21
21
21
freq
Law of large number
n
nf k
k
Refinement
k
nfk
Kk
nk
K
kk qnfnf
nq
nn
nf
,...,,...,)Pr(
11
i
in xpxxp )(),...,( 1
![Page 32: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/32.jpg)
),...,(),...,( 11 KK qqqfff
Kk
Kk
Kk
nnnn
qqqqq
aaaa
21
21
21
freq
Law of large number
n
nf k
k
Refinement
)|()Pr(log1
qfDfn
)|(2~)Pr( qfnDf
Large deviation
![Page 33: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/33.jpg)
Kolmogorov complexity
Example: a string 011011011011…011
Program: for (i =1 to n/3) write(011) endCan be translated to binary machine code
Kolmogorov complexity = length of shortest machine code that reproduce the string no probability distribution involved
If a long sequence is not compressible, then it has all the statistical properties of a sequence of coin flipping
string = f(coin flippings)
![Page 34: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/34.jpg)
Joint and conditional entropy
121
21
2222212
1112111
21
j
kkjkkk
j
j
j
ppp
ppppa
ppppa
ppppa
bbb
kjjk pbYaX )&Pr(
)&Pr(),( yYxXyxp
kk paX )Pr(
jj pbY )Pr(
)Pr()( xXxpX )Pr()( yYypY
Joint distribution
Marginal distribution
e.g., eye color & hair color
![Page 35: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/35.jpg)
jkjjk ppbYaX /)|Pr(
kkjkj ppaXbY /)|Pr(
)|Pr()|(| yYxXyxp YX
)|Pr()|(| xXyYxyp XY
)|()()|()(),( || yxpypxypxpyxp YXYXYX
Joint and conditional entropy
121
21
2222212
1112111
21
j
kkjkkk
j
j
j
ppp
ppppa
ppppa
ppppa
bbb
Conditional distribution
Chain rule
![Page 36: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/36.jpg)
121
21
2222212
1112111
21
j
kkjkkk
j
j
j
ppp
ppppa
ppppa
ppppa
bbb
Joint and conditional entropy
kj yxkjkj YXpyxpyxpppYXH
,
)],(log[),(log),(log),( E
y
xypxypxXYH )|(log)|()|(
x x y
xypxypxpxXYHxpXYH )|(log)|()()|()()|(
)]|(log[)|(log),()|(,
XYpxypyxpXYHyx
E
![Page 37: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/37.jpg)
)|()()|()(),( || yxpypxypxpyxp YXYXYX
)|()(),( XYHXHYXH
Chain rule
),...,|(),...,(1
111
n
iiin XXXHXXH
![Page 38: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/38.jpg)
])()(
),([log))()(|),(();(
YpXp
YXpypxpyxpDYXI E
yx ypxp
yxpyxpYXI
, )()(
),(log),();(
Mutual information
)|()();( YXHXHYXI
),()()();( YXHYHXHYXI
![Page 39: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/39.jpg)
Entropy rate
Stochastic process ,...),...,,( 21 nXXX not independent
Markov chain:
)|Pr(),...,|Pr( 111111 nnnnnnnn xXxXxXxXxX
),...,(1
lim 1 nn XXHn
H Entropy rate:
Stationary process: ),...,|(lim 11 XXXHH nnn
Stationary Markov chain: )|( 1 nn XXHH
(compression)
![Page 40: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/40.jpg)
Shannon, 1948
1. Zero-order approximationXFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD. 2. First-order approximation OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL. 3. Second-order approximation (digram structure as in English). ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE. 4. Third-order approximation (trigram structure as in English). IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE. 5. First-order word approximation. REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE. 6. Second-order word approximation. THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
![Page 41: Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School](https://reader030.vdocument.in/reader030/viewer/2022033021/5681348a550346895d9b6e6c/html5/thumbnails/41.jpg)
Summary
Entropy of a distribution measures randomness or uncertainty log of the number of equally likely choices average number of coin flips average length of prefix code (Kolmogorov: shortest machine code randomness)
Relative entropy from one distribution to the other measure the departure from the first to the second coding redundancy large deviation
Conditional entropy, mutual information, entropy rate