tiago t. v. vinhoza april 16, 2010 - uppaginas.fe.up.pt/~vinhoza/itpa/lecture4.pdf · information...

33
. . . . . . . . . Information Theory: Principles and Applications Tiago T. V. Vinhoza April 16, 2010 Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 1 / 33

Upload: dodien

Post on 10-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

. . . . . .

.

.

. ..

.

.

Information Theory: Principles and Applications

Tiago T. V. Vinhoza

April 16, 2010

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 1 / 33

. . . . . .

.. .1 Channel Coding TheoremPreview and some definitionsAchievabilityConverse

.. .2 Error Correcting Codes

.. .3 Joint Source-Channel Coding

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 2 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Why the channel capacity is important?

Shannon proved that the channel capacity is the maximum number ofbits that can be reliably transmitted over the channel.Reliably = probability of error can be made arbitrarily small.Channel coding theorem.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 3 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Intuitive idea of channel capacity as a fundamental limit

Basic idea: For large block lengths, every channel looks like the noisytypewriter channel shown last class.Channel has a subset of inputs that produce disjoint sequences at theoutput.Tipicality arguments.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 4 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Intuitive idea of channel capacity as a fundamental limit

For each typical input sequence of length n, there are approximately2nH(Y |X) possible Y sequences.Desirable: No two different X sequences produce the same Y outputsequence.Total number of typical Y sequences is approx. 2nH(Y ).So that total number of disjoint sets is less or equal to2n(H(Y )−H(Y |X) = 2nI(X;Y )

At most 2nI(X;Y ) distinguishable sequences of length n can be sent.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 5 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Formalizing the ideas

Message W is drawn from the index set {1, 2, . . . , M}.Encoded signal Xn(W ).The encoded signal passes through the channel and is received as thesequence Yn. The channel is described by a transition probabilitymatrix P (Yn|Xn).

Receiver guesses index W using a decoding rule W = g(Yn).

If W = W then an error occurs.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 6 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Discrete Channel

A discrete channel consists of two finite sets X and Y and a collectionof probability distributions pY |X=x(y) one for each x ∈ X .

(X , pY |X=x(y),Y)

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 7 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Extension of the Discrete Memoryless Channel

The n-th extension of the discrete memoryless channel is the channel(X n, pYn|Xn=xn(yn),Yn) where

P (Yk|Xk,Yk−1) = P (Yk|Xk), k = 1, 2, . . . , n.

If the channel is used without feedback, that is, the inputs do notdepend on the past outputs then the channel transition function forthe discrete memoryless channel is

pYn|Xn=xn(yn) =n∏

i=1

pYi|Xi=xi(yi)

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 8 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Code for a channel

An (M, n) code for the channel (X , pY |X=x(y),Y) consists of thefollowing:

An index set {1, 2, . . . ,M}.An encoding function Xn : {1, 2, . . . ,M} → Xn, that generatescodewords Xn(1), . . . ,Xn(M).A decoding function

g : Yn → {1, 2, . . . ,M}.

which assigns a guess to each possible received vector

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 9 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Rate of a code

The rate R of an (M, n) code is:

R =log M

nbits per transmission.

A rate R is achievable if there exists a sequence of (2⌈nR⌉, n) codessuch that the maximal probability of error goes to zero as n goes toinfinity.The capacity of a discrete memoryless channel is the supremum of allachievable rates.Rates less than capacity yield arbitrarily small probability of error forsufficiently large n.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 10 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Probability of Error

Conditional probability of error given that index i was sent:

λi = P (g(Yn) = i|Xn = Xn(i)) =∑yn

pYn|Xn=xn(i)(yn)I(g(yn) = i)

where I(·) is the indicator function.Maximal probability of error:

λ(n) = maxi∈{1,2,...M}

λi

Average probability of error for an (M, n) code:

P (n)e =

1M

M∑i=1

λi

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 11 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Definition: Joint Typical Sequences

The decoding procedure employed in the proofs will decode a channeloutput Yn as the i-th index if the codeword Xn(i) is jointly-typicalwith the received sequence Yn.

The set A(n)ϵ of jointly typical sequences {(xn,yn)} with respect to

their joint distribution is the set of n-sequences with sample entropyϵ-close to the true entropies.

A(n)ϵ =

{(xn,yn) ∈ X n × Yn :

∣∣∣∣− log pXn(xn)n

− H(X)∣∣∣∣ < ϵ∣∣∣∣− log pYn(yn)

n− H(Y )

∣∣∣∣ < ϵ∣∣∣∣− log pXnYn(xn,yn)n

− H(X,Y )∣∣∣∣ < ϵ

}

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 12 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Joint AEP

Let (Xn,Yn) be sequences of length n drawn i.i.d. according to thejoint distribution pXnYn(xn,yn) =

∏ni=1 pXiYi(xi, yi) then

P ((Xn,Yn) ∈ A(n)ϵ ) → 1 as n → ∞.

|A(n)ϵ | ≤ 2n(H(X,Y )+ϵ) and |A(n)

ϵ | ≥ (1 − ϵ)2n(H(X,Y )−ϵ)

If (Xn, Yn) are independent and have the same marginals as Xn andYn

P ((Xn, Yn) ∈ A(n)ϵ ) =

∑(xn,yn)∈A

(n)ϵ

pXn(xn)pYn(yn)

≤ 2n(H(X,Y )+ϵ)2−n(H(X)−ϵ)2−n(H(Y )−ϵ)

= 2−n(I(X,Y )−3ϵ)

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 13 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Joint AEP

There are about 2nH(X) typical X sequences, and about 2nH(Y )

typical Y sequences. However, since there are only 2nH(X,Y ) jointlytypical sequences, not all pairs (Xn,Yn) with Xn and Yn beingtypical are jointly typical.The probability that any randomly chosen pair is jointly typical isabout 2−nI(X;Y ). So, for a fixed Yn sequence, we can consider2nI(X;Y ) of pairs before we come across a jointly typical pair. Thissuggests that there are about 2nI(X;Y ) distinguishable sequences Xn

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 14 / 33

. . . . . .

Channel Coding Theorem Preview and some definitions

Channel Coding Theorem

Achievability: Consider a discrete memoryless channel with capacityC. All rates R < C are achievable. Specifically, for every rate R < Cthere exists a sequence of (2nR, n) codes with maximum probability oferror arbitrarily small.Converse: Consider a discrete memoryless channel with capacity C.For any sequence of (2nR, n) codes with maximum probability of erroras small as we want, then R < C.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 15 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Generate an (2nR, n) code at random accordion to the distributionpX(x), that is, generate 2nR codewords according to the probabilitydistribution pXn(xn) =

∏ni=1 pXi(xi).

Exhibit the 2nR codewords as the rows of the matrix

C =

x1(1) x2(1) . . . xn(1)...

.... . .

...x1(2nR) x2(2nR) . . . xn(2nR)

Reveal the code to transmitter and receiver (they both know thechannel transition matrix P (Y |X), too.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 16 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Message W is chosen according to a uniform distribution, that is,P (W = w) = 2−nR, for w = 1, 2, . . . , 2nR.The codeword Xn(w), corresponding to the w-th row of matrix C issent over the channel.The receiver gets sequence Yn according to the distributionpYn|Xn=xn(w)(yn) =

∏ni=1 pYi|Xi=xi(w)(yi)

Receiver guesses message using typical set decodingReceiver declares that index i was sent if

(Xn(i),Yn) are jointly typical.there is no other index j such that (Xn(j),Yn) are jointly typical.

Otherwise the receive declares and error.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 17 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Analysis of the error probabilityInstead of calculating the probability of error for a single code, wecompute the average over all codes generated at random according tothe probability distribution P (C)Two types of error events: The output Yn is not jointly typical withthe transmitted codeword or there is another codeword with is alsojointly typical with Yn.The probability that the transmitted codeword and the receivedsequence are jointly typical goes to one as shown by the AEP.For the rival codewords, the probability that any one of them is jointlytypical with the received sequence is about 2−nI(X;Y ), so we can use2nI(X;Y ) codewords and have a small error probability.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 18 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Calculating the average error probability

P (E) =∑C

P (C)P (n)e (C)

=∑C

P (C)1

2nR

2nR∑w=1

λw(C)

=1

2nR

2nR∑w=1

∑C

P (C)λw(C)

By the symmetry of the code construction, the average probability oferror over all codes does not depend on the particular index that wassent. ∑

C

P (C)λw(C) is not a function of w.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 19 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Assuming WLOG that W = 1 was sent.

P (E) =∑C

P (C)λ1(C) = P (E|W = 1)

Defining the events

Ei = {(Xn(i),Yn) is in A(n)ϵ }, i = 1, 2, . . . , 2nR

The error events in our case areE1, that is, the complement of E1 occurs. This means that Yn andXn(1) are not jointly typical.E2 or E3 or ... E2nR occurs. This means that a wrong codeword isjointly typical with Yn.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 20 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Evaluating

P (E|W = 1) = P (E1 ∪ E2 ∪ E3 ∪ . . . ∪ E2nR)

≤ P (E1) +2nR∑i=2

P (Ei)

The inequality is due to the union bound.By the joint AEP, P (E1) < ϵ for sufficiently large n.As Xn(1) and Xn(i) are independent (code generation procedure), itfollows that Yn and Xn(i) are also independent if i = 1. Hence, fromthe joint AEP

P (Ei) ≤ 2−n(I(X;Y )−3ϵ) if i = 1.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 21 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Evaluating

P (E|W = 1) ≤ ϵ +2nR∑i=2

2−n(I(X;Y )−3ϵ)

= ϵ + (2nR − 1)2−n(I(X;Y )−3ϵ)

≤ ϵ + (2nR)2−n(I(X;Y )−3ϵ)

= ϵ + (2n3ϵ)2−n(I(X;Y )−R)

≤ 2ϵ

if n is sufficiently large and R < I(X; Y ) − 3ϵ

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 22 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

Evaluating

P (E|W = 1) ≤ ϵ +2nR∑i=2

2−n(I(X;Y )−3ϵ)

= ϵ + (2nR − 1)2−n(I(X;Y )−3ϵ)

≤ ϵ + (2nR)2−n(I(X;Y )−3ϵ)

= ϵ + (2n3ϵ)2−n(I(X;Y )−R)

≤ 2ϵ

if n is sufficiently large and R < I(X; Y ) − 3ϵ

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 23 / 33

. . . . . .

Channel Coding Theorem Achievability

Channel Coding Theorem: Achievability

If R < I(X; Y ), we can choose ϵ and n so that the averageprobability of error over all codebooks is less than 2ϵ.If the input distribution pX(x) is the one that achieves the channelcapacity C, then the achievability condition is replaced by R < C.If the average probability of error over all codebooks is less than 2ϵ,than there exists at least one codebook C∗ with an average probabilityof error P

(n)e ≤ 2ϵ.

2ϵ ≥ 12nR

∑i

λi(C∗) = P (n)e

This implies that at least half of the indices i and their codewordshave λi < 4ϵ. Using only this best half of codewords we have 2nR−1

codewords and the new rate R′ = R − 1/n ≈ R for large n.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 24 / 33

. . . . . .

Channel Coding Theorem Converse

Channel Coding Theorem: Converse

We have now to show that any sequence of (2nR, n) codes withλ(n) → 0 must have R ≤ C.If the maximal error probability goes to zero, the average errorprobability, P

(n)e , also goes to zero. For each n, let W be drawn from

a uniform distribution over {1, 2, . . . , 2nR}. Since W is uniform,P

(n)e = P (W = W )

We will resort to the Fano’s Inequality to prove the converse.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 25 / 33

. . . . . .

Channel Coding Theorem Converse

Channel Coding Theorem: Converse

Proving the converse:

nR = H(W ) = H(W |Yn) + I(W ;Yn)≤ H(W |Yn) + I(Xn(W );Yn)≤ 1 + P (n)

e nR + I(Xn(W );Yn)≤ 1 + P (n)

e nR + nC

Dividing by n, and rewriting we get

P (n)e ≥ 1 − C

R− 1

nR

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 26 / 33

. . . . . .

Channel Coding Theorem Converse

Channel Coding Theorem

The theorem shows that good codes exist with exponentially smallprobability of error for long block lengths.No systematic way of constructing such codes is provided.Random Codes: No structure → Lookup table decoding → Huge tablesize for large blocks.1950’s: Coding theorists started searching for good codes and todevise efficient implementations.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 27 / 33

. . . . . .

Error Correcting Codes

Error Correcting Codes

Error control coding: Addition of redundancy in a smart way tocombat errors induced by the channel.Error detectionError correction

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 28 / 33

. . . . . .

Error Correcting Codes

Error Correcting Codes

Block CodesLinear Codes: Encoding and DecodingHamming Codes

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 29 / 33

. . . . . .

Joint Source-Channel Coding

Joint Source Channel Coding

The source coding theorem states that for data compression R > H.The channel coding theorem states that for data transmission R < C.Is the condition H < C necessary and sufficient for sending a sourceover a channel?

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 30 / 33

. . . . . .

Joint Source-Channel Coding

Joint Source Channel Coding

Consider a source modeled by a finite alphabet stochastic processV n = V1, V2, . . . , Vn,with entropy rate H(V) that satisfies the AEP.Achievability: The source can be sent reliably over a discretememoryless channel with capacity C if H(V) < C

Converse: The source cannot be sent reliably over a discretememoryless channel with capacity C if H(V) > C

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 31 / 33

. . . . . .

Joint Source-Channel Coding

Joint Source Channel Coding

The source channel separation theorem shows that it is possible todesign the source code and the channel code separately and combinethe results to achieve optimal performance.Asymptotic optimality: For finite block length, the probability of errorcan be reduced by using joint source-channel coding.That separation, however, fails for some multiuser channels.

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 32 / 33

. . . . . .

Joint Source-Channel Coding

Next Steps

Lossy Source CodingMultiple-user Information Theory

Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 33 / 33