tiago t. v. vinhoza april 16, 2010 - uppaginas.fe.up.pt/~vinhoza/itpa/lecture4.pdf · information...

. . . . . .

.

.

. ..

.

.

Information Theory: Principles and Applications

Tiago T. V. Vinhoza

April 16, 2010

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 1 / 33

. . . . . .

.. .1 Channel Coding TheoremPreview and some definitionsAchievabilityConverse

.. .2 Error Correcting Codes

.. .3 Joint Source-Channel Coding


. . . . . .

Channel Coding Theorem Preview and some definitions

Why the channel capacity is important?

Shannon proved that the channel capacity is the maximum number ofbits that can be reliably transmitted over the channel.Reliably = probability of error can be made arbitrarily small.Channel coding theorem.


. . . . . .


Intuitive idea of channel capacity as a fundamental limit

Basic idea: For large block lengths, every channel looks like the noisytypewriter channel shown last class.Channel has a subset of inputs that produce disjoint sequences at theoutput.Tipicality arguments.


. . . . . .


Intuitive idea of channel capacity as a fundamental limit

For each typical input sequence of length n, there are approximately2nH(Y |X) possible Y sequences.Desirable: No two different X sequences produce the same Y outputsequence.Total number of typical Y sequences is approx. 2nH(Y ).So that total number of disjoint sets is less or equal to2n(H(Y )−H(Y |X) = 2nI(X;Y )

At most 2nI(X;Y ) distinguishable sequences of length n can be sent.


. . . . . .


Formalizing the ideas

Message W is drawn from the index set {1, 2, . . . , M}.Encoded signal Xn(W ).The encoded signal passes through the channel and is received as thesequence Yn. The channel is described by a transition probabilitymatrix P (Yn|Xn).

Receiver guesses index W using a decoding rule W = g(Yn).

If W = W then an error occurs.


. . . . . .


Definition: Discrete Channel

A discrete channel consists of two finite sets X and Y and a collectionof probability distributions pY |X=x(y) one for each x ∈ X .

(X , pY |X=x(y),Y)


. . . . . .


Definition: Extension of the Discrete Memoryless Channel

The n-th extension of the discrete memoryless channel is the channel(X n, pYn|Xn=xn(yn),Yn) where

P (Yk|Xk,Yk−1) = P (Yk|Xk), k = 1, 2, . . . , n.

If the channel is used without feedback, that is, the inputs do notdepend on the past outputs then the channel transition function forthe discrete memoryless channel is

pYn|Xn=xn(yn) =n∏

i=1

pYi|Xi=xi(yi)


. . . . . .


Definition: Code for a channel

An (M, n) code for the channel (X , pY |X=x(y),Y) consists of thefollowing:

An index set {1, 2, . . . ,M}.An encoding function Xn : {1, 2, . . . ,M} → Xn, that generatescodewords Xn(1), . . . ,Xn(M).A decoding function

g : Yn → {1, 2, . . . ,M}.

which assigns a guess to each possible received vector


. . . . . .


Definition: Rate of a code

The rate R of an (M, n) code is:

R =log M

nbits per transmission.

A rate R is achievable if there exists a sequence of (2⌈nR⌉, n) codessuch that the maximal probability of error goes to zero as n goes toinfinity.The capacity of a discrete memoryless channel is the supremum of allachievable rates.Rates less than capacity yield arbitrarily small probability of error forsufficiently large n.


. . . . . .


Definition: Probability of Error

Conditional probability of error given that index i was sent:

λi = P (g(Yn) = i|Xn = Xn(i)) =∑yn

pYn|Xn=xn(i)(yn)I(g(yn) = i)

where I(·) is the indicator function.Maximal probability of error:

λ(n) = maxi∈{1,2,...M}

λi

Average probability of error for an (M, n) code:

P (n)e =

1M

M∑i=1

λi


. . . . . .


Definition: Joint Typical Sequences

The decoding procedure employed in the proofs will decode a channeloutput Yn as the i-th index if the codeword Xn(i) is jointly-typicalwith the received sequence Yn.

The set A(n)ϵ of jointly typical sequences {(xn,yn)} with respect to

their joint distribution is the set of n-sequences with sample entropyϵ-close to the true entropies.

A(n)ϵ =

{(xn,yn) ∈ X n × Yn :

∣∣∣∣− log pXn(xn)n

− H(X)∣∣∣∣ < ϵ∣∣∣∣− log pYn(yn)

n− H(Y )

∣∣∣∣ < ϵ∣∣∣∣− log pXnYn(xn,yn)n

− H(X,Y )∣∣∣∣ < ϵ

}


. . . . . .


Joint AEP

Let (Xn,Yn) be sequences of length n drawn i.i.d. according to thejoint distribution pXnYn(xn,yn) =

∏ni=1 pXiYi(xi, yi) then

P ((Xn,Yn) ∈ A(n)ϵ ) → 1 as n → ∞.

|A(n)ϵ | ≤ 2n(H(X,Y )+ϵ) and |A(n)

ϵ | ≥ (1 − ϵ)2n(H(X,Y )−ϵ)

If (Xn, Yn) are independent and have the same marginals as Xn andYn

P ((Xn, Yn) ∈ A(n)ϵ ) =

∑(xn,yn)∈A

(n)ϵ

pXn(xn)pYn(yn)

≤ 2n(H(X,Y )+ϵ)2−n(H(X)−ϵ)2−n(H(Y )−ϵ)

= 2−n(I(X,Y )−3ϵ)


. . . . . .


Joint AEP

There are about 2nH(X) typical X sequences, and about 2nH(Y )

typical Y sequences. However, since there are only 2nH(X,Y ) jointlytypical sequences, not all pairs (Xn,Yn) with Xn and Yn beingtypical are jointly typical.The probability that any randomly chosen pair is jointly typical isabout 2−nI(X;Y ). So, for a fixed Yn sequence, we can consider2nI(X;Y ) of pairs before we come across a jointly typical pair. Thissuggests that there are about 2nI(X;Y ) distinguishable sequences Xn


. . . . . .


Channel Coding Theorem

Achievability: Consider a discrete memoryless channel with capacityC. All rates R < C are achievable. Specifically, for every rate R < Cthere exists a sequence of (2nR, n) codes with maximum probability oferror arbitrarily small.Converse: Consider a discrete memoryless channel with capacity C.For any sequence of (2nR, n) codes with maximum probability of erroras small as we want, then R < C.


. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Generate an (2nR, n) code at random accordion to the distributionpX(x), that is, generate 2nR codewords according to the probabilitydistribution pXn(xn) =

∏ni=1 pXi(xi).

Exhibit the 2nR codewords as the rows of the matrix

C =

x1(1) x2(1) . . . xn(1)...

.... . .

...x1(2nR) x2(2nR) . . . xn(2nR)

Reveal the code to transmitter and receiver (they both know thechannel transition matrix P (Y |X), too.


. . . . . .



Message W is chosen according to a uniform distribution, that is,P (W = w) = 2−nR, for w = 1, 2, . . . , 2nR.The codeword Xn(w), corresponding to the w-th row of matrix C issent over the channel.The receiver gets sequence Yn according to the distributionpYn|Xn=xn(w)(yn) =

∏ni=1 pYi|Xi=xi(w)(yi)

Receiver guesses message using typical set decodingReceiver declares that index i was sent if

(Xn(i),Yn) are jointly typical.there is no other index j such that (Xn(j),Yn) are jointly typical.

Otherwise the receive declares and error.


. . . . . .



Analysis of the error probabilityInstead of calculating the probability of error for a single code, wecompute the average over all codes generated at random according tothe probability distribution P (C)Two types of error events: The output Yn is not jointly typical withthe transmitted codeword or there is another codeword with is alsojointly typical with Yn.The probability that the transmitted codeword and the receivedsequence are jointly typical goes to one as shown by the AEP.For the rival codewords, the probability that any one of them is jointlytypical with the received sequence is about 2−nI(X;Y ), so we can use2nI(X;Y ) codewords and have a small error probability.


. . . . . .



Calculating the average error probability

P (E) =∑C

P (C)P (n)e (C)

=∑C

P (C)1

2nR

2nR∑w=1

λw(C)

=1

2nR

2nR∑w=1

∑C

P (C)λw(C)

By the symmetry of the code construction, the average probability oferror over all codes does not depend on the particular index that wassent. ∑

C

P (C)λw(C) is not a function of w.


. . . . . .



Assuming WLOG that W = 1 was sent.

P (E) =∑C

P (C)λ1(C) = P (E|W = 1)

Defining the events

Ei = {(Xn(i),Yn) is in A(n)ϵ }, i = 1, 2, . . . , 2nR

The error events in our case areE1, that is, the complement of E1 occurs. This means that Yn andXn(1) are not jointly typical.E2 or E3 or ... E2nR occurs. This means that a wrong codeword isjointly typical with Yn.


. . . . . .



Evaluating

P (E|W = 1) = P (E1 ∪ E2 ∪ E3 ∪ . . . ∪ E2nR)

≤ P (E1) +2nR∑i=2

P (Ei)

The inequality is due to the union bound.By the joint AEP, P (E1) < ϵ for sufficiently large n.As Xn(1) and Xn(i) are independent (code generation procedure), itfollows that Yn and Xn(i) are also independent if i = 1. Hence, fromthe joint AEP

P (Ei) ≤ 2−n(I(X;Y )−3ϵ) if i = 1.


. . . . . .



Evaluating

P (E|W = 1) ≤ ϵ +2nR∑i=2

2−n(I(X;Y )−3ϵ)

= ϵ + (2nR − 1)2−n(I(X;Y )−3ϵ)

≤ ϵ + (2nR)2−n(I(X;Y )−3ϵ)

= ϵ + (2n3ϵ)2−n(I(X;Y )−R)

≤ 2ϵ

if n is sufficiently large and R < I(X; Y ) − 3ϵ


. . . . . .



If R < I(X; Y ), we can choose ϵ and n so that the averageprobability of error over all codebooks is less than 2ϵ.If the input distribution pX(x) is the one that achieves the channelcapacity C, then the achievability condition is replaced by R < C.If the average probability of error over all codebooks is less than 2ϵ,than there exists at least one codebook C∗ with an average probabilityof error P

(n)e ≤ 2ϵ.

2ϵ ≥ 12nR

∑i

λi(C∗) = P (n)e

This implies that at least half of the indices i and their codewordshave λi < 4ϵ. Using only this best half of codewords we have 2nR−1

codewords and the new rate R′ = R − 1/n ≈ R for large n.


. . . . . .

Channel Coding Theorem Converse

Channel Coding Theorem: Converse

We have now to show that any sequence of (2nR, n) codes withλ(n) → 0 must have R ≤ C.If the maximal error probability goes to zero, the average errorprobability, P

(n)e , also goes to zero. For each n, let W be drawn from

a uniform distribution over {1, 2, . . . , 2nR}. Since W is uniform,P

(n)e = P (W = W )

We will resort to the Fano’s Inequality to prove the converse.


. . . . . .


Channel Coding Theorem: Converse

Proving the converse:

nR = H(W ) = H(W |Yn) + I(W ;Yn)≤ H(W |Yn) + I(Xn(W );Yn)≤ 1 + P (n)

e nR + I(Xn(W );Yn)≤ 1 + P (n)

e nR + nC

Dividing by n, and rewriting we get

P (n)e ≥ 1 − C

R− 1

nR


. . . . . .


Channel Coding Theorem

The theorem shows that good codes exist with exponentially smallprobability of error for long block lengths.No systematic way of constructing such codes is provided.Random Codes: No structure → Lookup table decoding → Huge tablesize for large blocks.1950’s: Coding theorists started searching for good codes and todevise efficient implementations.


. . . . . .

Error Correcting Codes


Error control coding: Addition of redundancy in a smart way tocombat errors induced by the channel.Error detectionError correction


. . . . . .



Block CodesLinear Codes: Encoding and DecodingHamming Codes


. . . . . .

Joint Source-Channel Coding

Joint Source Channel Coding

The source coding theorem states that for data compression R > H.The channel coding theorem states that for data transmission R < C.Is the condition H < C necessary and sufficient for sending a sourceover a channel?


. . . . . .



Consider a source modeled by a finite alphabet stochastic processV n = V1, V2, . . . , Vn,with entropy rate H(V) that satisfies the AEP.Achievability: The source can be sent reliably over a discretememoryless channel with capacity C if H(V) < C

Converse: The source cannot be sent reliably over a discretememoryless channel with capacity C if H(V) > C


. . . . . .



The source channel separation theorem shows that it is possible todesign the source code and the channel code separately and combinethe results to achieve optimal performance.Asymptotic optimality: For finite block length, the probability of errorcan be reduced by using joint source-channel coding.That separation, however, fails for some multiuser channels.


. . . . . .


Next Steps

Lossy Source CodingMultiple-user Information Theory


tiago t. v. vinhoza april 16, 2010 - uppaginas.fe.up.pt/~vinhoza/itpa/lecture4.pdf · information...

Documents