![Page 1: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/1.jpg)
Some aspects of information theory for a computer scientist
Eric Fabrehttp://people.rennes.inria.fr/Eric.Fabrehttp://www.irisa.fr/sumo
11 Sep. 2014
![Page 2: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/2.jpg)
Outline
1. Information: measure and compression
2. Reliable transmission of information
3. Distributed compression
4. Fountain codes
5. Distributed peer-to-peer storage
11/09/14
![Page 3: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/3.jpg)
Information: measure and compression
11/09/14
1
![Page 4: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/4.jpg)
Let’s play…
11/09/14
One card is drawn at random in the following set.Guess the color of the card, with a minimum of yes/no questions
One strategy• is it hearts ? • if not, is it clubs ?• if not, is it diamonds ?
Wins in• 1 guess, with probability ½ • 2 guesses, with prob. ¼• 3 guesses, with prob. ¼ 1.75 questions on average
Is there a better strategy ?
![Page 5: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/5.jpg)
11/09/14
Observation
Lessons- more likely means easier to guess (carries less information)- amount of information depends only on the log likelihood of an event- guessing with yes/no questions = encoding with bits = compressing
1
01001000
![Page 6: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/6.jpg)
11/09/14
Important remark:
• codes like the one below are not permitted
• they cannot be uniquely decoded if one transmits sequences of encoded values of Xe.g. sequence 11 can encode “Diamonds” or “Hearts,Hearts”
• one would need one extra symbol to separate “words”
1
01100
![Page 7: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/7.jpg)
Entropy
11/09/14
Source of information = random variablenotation: variables X, Y, … taking values x, y, …
information carried by event “X=x”
average information carried by X
H(X) measures the average difficulty to encode/describe/guess random outcomes of X
![Page 8: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/8.jpg)
Properties
11/09/14
with equality iff X and Y independent(i.e. )
with equality iff X not random
with equality iff is uniform
Bernouilli distribution
![Page 9: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/9.jpg)
Conditional entropy
11/09/14
uncertainty left on Y when X is known
Property
with equality iff Y and X independent
![Page 10: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/10.jpg)
11/09/14
Example : X = color, Y = value
average
recall
so one checks
Exercise : check that
![Page 11: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/11.jpg)
11/09/14
A visual representation
![Page 12: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/12.jpg)
11/09/14
Data compression
CoDec for source X, with R bits/sample on average
rate R is achievable iff there exists CoDec pairs (fn,gn) of rate R
with vanishing error probability :
Usage: there was no better strategy for our card game !
Theorem (Shannon, ‘48) : - a lossless compression scheme for source X must have
a rate R ≥ H(X) bits/sample on average- the rate H(X) is (asymptotically) achievable
![Page 13: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/13.jpg)
11/09/14
Proof
Solution 1• use a known optimal lossless coding scheme for X : the Huffman code• then prove H(X) ≤ L < H(X) + 1
• over n independent symbols X1,…,Xn, one has
Necessity : if R achievable, then R ≥ H(X), quite easy to proveSufficiency : for R > H(X), it requires to build a losslesscoding scheme of using R bits/sample on average
Solution 2 : encoding only “typical sequences”
![Page 14: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/14.jpg)
11/09/14
Typical sequences
Let X1,…,Xn be independent, same law
By the law of large numbers, one has the a.s. convergence
Sequence is typical iff
or equivalently
Set of typical sequences :
![Page 15: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/15.jpg)
11/09/14
AEP : asymptotic equipartition property
• one has
• and
So non typical sequences count for 0, and there are approximately
typical sequences, each of probability
2nH(X) typical sequences
Kn=2n log2 K sequences, where
Optimal lossless compression• encode a typical sequence with nH(X) bits
• encode a non-typical sequence with n log2 K bits
• add 0 / 1 as prefix to mean typ. / non-typ.
![Page 16: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/16.jpg)
11/09/14
Practical coding schemes
Encoding by typicality is unpractical !
Practical codes : • Huffman code• arithmetic coding (adapted to data flows)• etc.All require to know the distribution of the source to be efficient.
Universal code:• does not need to know the source distribution
• for long sequences X1…Xn, converge to the optimal rate H(X) bits/symbol
• example: Lempel-Ziv algorithm (used in ZIP, Compress, etc.)
![Page 17: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/17.jpg)
11/09/14
Reliable transmission of information
2
![Page 18: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/18.jpg)
Mutual information
11/09/14
Properties
with equality iff X and Y are independent
measures how many bits X and Y have in common (on average)
![Page 19: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/19.jpg)
Noisy channel
11/09/14
Channel = input alphabet, output alphabet, transition probability
A B A AB B
observe that is left free
Capacity
maximizes the coupling between input and output lettersfavors letters that are the less altered by noise
bits / use of channel
![Page 20: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/20.jpg)
Example
11/09/14
The erasure channel : a proportion of p bits are erased
A B
Define the erasure variable E = f(B) with E=1 when an erasure occurred, and E=0 otherwise
E0
1
andSo
![Page 21: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/21.jpg)
Protection against errors
11/09/14
Idea: add extra bits to the message, to augment its inner redundancy (this is exactly the converse of data compression)
Coding scheme
• X takes values in { 1, 2, … , M=2nR }
• rate of the codec R = log2(M) / n transmitted bits / channel use
• R is achievable iff there exists a series of (fn,gn) CoDecs of rate R
such that
fn gn
noisy channel
where
![Page 22: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/22.jpg)
Error correction (for a binary channel)
11/09/14
Repetition
• useful bit U sent 3 times : A1=A2=A3=U
• decoding by majority• detects and corrects one error… but R’=R/3
Parity checks
• X = k useful bits U1…Uk, expanded into n bits A1…An
• rate R = k/n
• for example: add extra redundant bits Vk+1…Vn that are
linear combinations of the U1…Uk
• examples: • ASCII code k=7, n=8• ISBN• social security number• credit card numberQuestions: how ??? and how many extra bits ???
![Page 23: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/23.jpg)
How ?
11/09/14
Almost all channel codes are linear : Reed-Solomon, Reed-Muller, Golay, BCH, cyclic codes, convolutional codes…Use finite field theory, and algebraic decoding techniques.
The Hamming code
• 4 useful bits U1…U4
• 3 redundant bits V1…V3
• rate R = 4/7• detects and corrects 1 error (exercise…)• trick : 2 codewords differ by at least 3 bits
U1
U2
U3
U4
V1
V2
V3
1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1
[ U1 … U4 ] = [ U1 … U4 V1 … V3 ]
Generating matrix (of a linear code)
![Page 24: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/24.jpg)
what Shannon proved in ’48
How much ?
11/09/14
what people believed before ‘48
Usage: measures the efficiency of an error correcting code for some channel
Theorem (Shannon, ‘48) : - any achievable transmission rate R must satisfy
R ≤ C transmitted bits / channel use - any transmission rate R < C is achievable
![Page 25: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/25.jpg)
Proof
11/09/14
Necessity: if a coding is (asympt.) error free, then its rate satisfies R≤ C, rather easy to proveSufficiency: any rate R<C is achievable, demands to build a coding scheme !
Idea = random coding !
• best distribution on the input alphabet of the channel
• build a random codeword w = a1…an drawing letters according to
(w is a typical sequence)
• sending w over the channel yields output w’ = b1…bn
which is a typical sequence for • and the pair (w,w’) is jointly typical for
![Page 26: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/26.jpg)
11/09/14
w1 w’1
M typical
sequencesas codewords
typical sequences
A1…An B1…Bn
jointly typical with w1
possible typical sequences
at output
w2 w’2
wM w’M.
. .
• if M is small enough, the output cones do not overlap (with high probability)• maximal number of input codewords :
which proves that any R <C is achievable !
![Page 27: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/27.jpg)
n transmitted bits
Perfect coding
11/09/14
Perfect code = error-free and achieves capacity. What does it look like ?
• by the data processing inequality
nR = H(X) = I(X;X) ≤ I(A1…An;B1…Bn) ≤ nC
• if R = C, then I(A1…An;B1…Bn) = nC
• possible iff letters of the codeword Ai are independent,
and each I(Ai;Bi)=C, i.e. each Ai carries R=C bits
fn gn
noisy channel
For a binary channel: R = k / n a perfect code spreads information uniformly over a larger number of bits
k useful bits
channel
![Page 28: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/28.jpg)
In practice
11/09/14
• Random coding unpractical: relies on a (huge) codebook for cod./dec.• Algebraic (linear) codes were preferred for long :
more structure, cod./dec. with algorithms• But in practice, they remained much below optimal rates !
• Things changed in 1993 when Berrou & Glavieux invented the turbo-codes• followed by the rediscovery of the low-density parity check codes (LDPC)
invented by Gallager in his PhD… in 1963 !• both code families behave like random codes… but come with
low-complexity cod./dec. algorithms
![Page 29: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/29.jpg)
Can feedback improve capacity ?
11/09/14
Principle• the outputs of the channel are revealed to the sender• the sender can use this information to adapt its next symbol
channel
But is can greatly simplify coding, decoding, and transmission protocols.
Theorem: Feedback does not improve channel capacity.
![Page 30: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/30.jpg)
2nd PART
11/09/14
Information theory was designed for point-to-point communications.Which was soon considered as a limitation…
broadcast channel: each user has a different channelmultiple access channel: interferences
Spread information: which structure for this object ?how to regenerate / transmit it ?
sd
![Page 31: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/31.jpg)
2nd PART
11/09/14
What is the capacity of a network ?
Are network links just pipes, with capacity, in which information flows like a fluid ?
A
B
C How many transmissions to broadcast from A to C,D and from B to C,D ?
D
E F
a a
a a
a
a
ab
bb b
b b
ba a
a
a
b
bb b
a +b
a
a
+b
b
a
+b By network coding, one transmissionover link E—F can be saved.
Medard & Koetter2003
![Page 32: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/32.jpg)
Outline
1. Information: measure and compression
2. Reliable transmission of information
3. Distributed compression
4. Fountain codes
5. Distributed peer-to-peer storage
11/09/14
![Page 33: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/33.jpg)
11/09/14
Distributed source coding
3
![Page 34: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/34.jpg)
Collecting spread information
11/09/14
• X, Y are two distant but correlated sources• transmit their value to a unique receiver (perfect channels)• no communication between the encoders
X
Y
dist
ance
encoder 1
encoder 2
joint decoder X,Yno
com
mun
icat
ion
I(X;Y)
H(Y|X)
H(X|Y)
• Naive solution = ignore correlation, compress and send
each source separately : rates R1=H(X), R2=H(Y)
• Can one do better, and take advantage of the correlation of X and Y ?
rate R1
rate R2K
![Page 35: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/35.jpg)
Example
11/09/14
• X = weather in Brest, Y = weather in Quimper• probability that weathers are identical is 0.89• one wishes to send the observed weather of 100 days in both cities
• One has H(X) = 1 = H(Y), so naïve encoding requires 200 bits• I(X;Y) = 0.5, so not sending the “common information” saves 50 bits
0.445 0.055
0.055 0.445
sun rainY
sun
rainX
![Page 36: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/36.jpg)
Necessary conditions
11/09/14
Question: what are the best possible achievable transmission rates ?
X
Y
dist
ance
encoder 1
encoder 2
joint decoder X,Yno
com
mun
icat
ion
I(X;Y)
H(Y|X)
H(X|Y) rate R1
rate R2
• Jointly, both coders must transmit the full pair (X,Y), so
R1+R2 ≥ H(X,Y)
• Each coder alone must transmit the private information that is not accessible through the other variable, so
R1 ≥ H(X|Y) and R2 ≥ H(Y|X)
A pair (R1,R2) is achievable is there exist separate encoders fnX and fn
Y
of sequences X1…Xn and Y1…Yn resp., and a joint decoder gn, that are
asymptotically error-free.
![Page 37: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/37.jpg)
Result
11/09/14
Theorem (Slepian & Wolf, ‘75) : The achievable region is defined by
• R1 ≥ H(X|Y)
• R2 ≥ H(Y|X)
• R1+R2 ≥ H(X,Y)
R1
R2
H(Y|X)
H(Y)
H(X|Y) H(X)
achievable region
The achievable region is easily shown to be convex, upper-right closed.
![Page 38: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/38.jpg)
Compression by random binning
11/09/14
• encode only typical sequences w = x1…xn =
• throw then at random into 2nR bins, with R>H(X)
1 2 3 2nR codeword, on R bits/symbol
…
Encoding of w = the number b of the bin where w lies
Decoding : if w = unique typical sequence in bin number b, output w otherwise, output “error”
Error probability
![Page 39: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/39.jpg)
Proof of Slepian-Wolf
11/09/14
• fX and fY are two independent random binnings of rates R1 and R2
for x = x1…xn and y = y1…yn resp.
• to decode the pair of bin numbers (bX,bY) = (fX(x),fY(y)), g outputs the unique pair (x,y) of jointly typical sequences in box (bX,bY) or “error” if there are more than one such pair.
• R2>H(Y|X) : given x, there are 2nH(Y|X) sequences y that are jointly typical with x
• R1+R2 > H(X,Y) : the number of boxes 2n(R1+R2) must be greater than 2nH(X,Y)
1 2 3 2nR1
1
2
3
2nR2
…
…x
y jointly typical pairs (x,y)
![Page 40: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/40.jpg)
Example
11/09/14
X= color
Y=value
0.5 1.251.25X Y
Questions:
1. Is there an instantaneous* transmission protocol for rates RX=1.25=H(X|Y), RY=1.75=H(Y) ?
• send Y (always) : 1.75 bits• what about X ?
(caution: the code for X should be uniquely decodable)
Y
X
?
?
?
?
0 10 110 111
(*) i.e. for sequences of length n=1
2. What about RX=RY=1.5 ?
K
![Page 41: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/41.jpg)
In practice
11/09/14
The Slepian-Wolf theorem extends to N sources.It long remained an academic result, since no practical coders existed.
Beginning of the 2000s, practical coders and applications appeared• compression of correlated images (e.g. same scene, 2 angles)• sensor networks (e.g. measure of a temperature field)• case of a channel with side information• acquisition of structured information, without communication
![Page 42: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/42.jpg)
11/09/14
Fountain codes
4
![Page 43: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/43.jpg)
Network protocols
11/09/14
TCP/IP (transmission control protocol)
network(erasure channel)
1234567 1 234
ack 2 ack 2
• slow for huge files over long-range connexions (e.g. cloud backups…)• feedback channel… but feedback does not improve capacity !• repetition code… the worst rate among error correcting codes !• designed by engineers who ignored information theory ? :o)
Drawbacks
• the erasure rate of the channel (thus capacity) is unknown / changing• feedback make protocols simpler• there exist faster protocols (UDP) for streaming feeds
However
![Page 44: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/44.jpg)
A fountain of information bits…
11/09/14
How to quickly and reliably transmit K packets of b bits?
Fountain code: • from k packets, generate and send a continuous
flow of packets• some get lost, some go through ; no feedback• as soon as a proportion K(1+ε) of them are received,
any of them, decoding becomes possible
Fountain codes are example of rateless codes (no predefined rate),or universal codes : they adapt to the channel capacity.
![Page 45: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/45.jpg)
Random coding…
11/09/14
Packet tn sent at time n is a random linear combinations of the
K packets s1…sK to transmit.
where the Gn,k are random IID binary variables.
…s1 sKb
bits
K packets
…t1 tK’
…s1 sK …t1 tK’=K
1001011 … 11010001 ... 00110100 … 1 … 1011010 … 0
G
K’
*
![Page 46: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/46.jpg)
Decoding
11/09/14
…s1 sK …t1 tK’=K
1001011 … 11010001 ... 00110100 … 1 … 1011010 … 0
G
K’
*
11 … 1 12 … 000 … 1 … 11 … 0
G’
N
K =r1 rN…*…s1 sK
Some packets are lost, and N out of K’ are received.This is equivalent to another random code with generating matrix G’.
How big should N be to enable decoding ?
![Page 47: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/47.jpg)
11/09/14
Decoding
• For N=K, what is the probability that G’ is invertible ?
One has where G’ is a random K*N binary matrix.If G’ is invertible, one can decode by
Answer: converges quickly to 0.289 (as soon as K>10).
• What about N=K+E ? What is the probability P that at least one K*K sub-matrix of G’ is invertible ?Answer: P =1-δ(E) where δ(E) ≤ 2-E ( δ(E)<10-6 for E=20)
exponential convergence to 1 with E, regardless of K.
Complexity
• K/2 operations per gerenated packet, so O(K2) for encoding• decoding: K3 for matrix inversion• one would like better complexities… linear ?
![Page 48: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/48.jpg)
LT codes
11/09/14
Invented by Michael Luby (2003), and inspired from LDPC codes (Gallager, 1963).
Idea : linear combinations of packets should be “sparse”
Encoding
• for each packet tn, randomly select a “degree” dn
according to some distribution ρ(d) on degrees
• choose at random dn packets among s1…sK
and take as tn the sum of these dn packets
• some nodes have low degree, others have high degree:makes the graph a small world
…s1 sK
t1 tN
…
![Page 49: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/49.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
![Page 50: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/50.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1
![Page 51: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/51.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1
![Page 52: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/52.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1 0
![Page 53: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/53.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1 0
![Page 54: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/54.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1 0 1
![Page 55: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/55.jpg)
Decoding LT codes
11/09/14
Idea = a simplified version of turbo-decoding (Berrou) that resembles cross-words solving
Example
1 0 1 1
1 0 1
How to choose degrees ?• each iteration should yield to a single new node of degree 1
achieved by distribution ρ(1)=1/K and ρ (d)=1/d(d-1) for d=2…K• average degree is logeK, so decoding complexity is K logeK• in reality
• one needs a few nodes of high degree to ensure that everypacket is connected to at least one check-node
• one needs a little more small degree nodes to ensure that decoding starts
![Page 56: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/56.jpg)
In practice…
11/09/14
Performance• both encoding and decoding are in K log K (instead of K2 and K3)• for large K>104, the observed overhead E represents from 5% to 10%• Raptor codes (Shokrollahy, 2003) do better : linear time complexity
Applications• broadcast to many users:
a fountain code adapts to the channel of each user no need to rebroadcast packets missed by some user
• storage on many unreliable devices e.g. RAID (redundant array of inexpensive disks) data centers peer-to-peer distributed storage
![Page 57: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/57.jpg)
11/09/14
Distributed P2P storage
5
![Page 58: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/58.jpg)
Principle
11/09/14
…s1sK
t1 tN…raw data
v
redundant data
t2
distinct storages (disks, peers,…)
Problems• disks can crash, peers can leave: eventual data loss• original data can be recovered if enough packets remain…
… but missing packets need to be restored
Idea = Raw data split into packets, expanded with some ECC. Each new created packet is stored independently. Original data erased.
Restoration• perfect : the packet that is lost is exactly replaced• functional : new packets are built, to preserve data recoverability• intermediate : maintain the systematic part of the data
t’2 t’N…
new peers
![Page 59: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/59.jpg)
Which codes ?
11/09/14
Fountain/random codes :
• random linear combinations of remaining blocks among t1…tn• will not preserve the appropriate degree distribution
Target : one should rebuild missing blocks… … without first rebuilding the original data ! (would require too much bandwith)
MDS codes: maximum distance separable codes
• can rebuild s1…sk from any subset of exactly k blocks in t1…tn• example : Reed-Solomon codes
![Page 60: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/60.jpg)
Example
11/09/14
k sets of α blocs
n sets of α blocs
a b c d
a+c b+d b+c a+b+da b ca b d
reconstruction a b
d b+d a+b+d
β blocs requested
![Page 61: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/61.jpg)
Example
11/09/14
k sets of α blocs
n sets of α blocs
a b c d
a+c b+d b+c a+b+da b c d b+c a+b+d
reconstruction
c+d b+d
b+c a+b+d
a
β blocs requested
Result (Dimakis et al., 2010): For functional repair,given k, n and d ≥ k (number of nodes to contact for repair)network coding techniques allows to optimally balanceα (number of blocs) and β (bandwidth necessary to reconstruction).
![Page 62: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/62.jpg)
11/09/14
Conclusion
6
![Page 63: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/63.jpg)
A few lessons
11/09/14
Ralf Koetter* : “Communications aren’t anymore about transmitting a bit, but about transmitting evidence about a bit.”
(*) one of the inventors of Network Coding
Random structures spread information uniformly.
Information theory gives bounds on how much one can learnabout some hidden information…
One does not have to build the actual protocols/codes that will reveal this information.
![Page 64: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/64.jpg)
Management of distributed information……in other fields
11/09/14
- A, B: random variables, possibly correlated- one wishes to compute in B the value f(A,B)- how many bits should be exchanged?- how many communication rounds?
Compressed sensing (signal processing)
- signal can be described by sparse coefficients- random (sub-Nyquist) sampling
Communications complexity (computer science)- A, B: variables, taking values in a huge space- how many bits should A send to B in order to check A=B ?- solution by random coding
A Bn bits
A=B ?
Digital communications (network information theory)
![Page 65: Some aspects of information theory for a computer scientist Eric Fabre 11 Sep. 2014](https://reader038.vdocument.in/reader038/viewer/2022110116/551c2d71550346b24f8b6200/html5/thumbnails/65.jpg)
thank you !