channel coding theorem (the most famous in it) channel capacity; problem: finding the maximum number...

Channel Coding TheoremChannel Coding Theorem(The most famous in IT)(The most famous in IT)

Channel Capacity;Channel Capacity;

Problem: Problem: finding the maximum number of distinguishable signals for finding the maximum number of distinguishable signals for n n uses of a communication channel. uses of a communication channel. This number grows exponentially with This number grows exponentially with nn, and the exponent , and the exponent is known as the channel capacity.is known as the channel capacity.

Mathematical modelMathematical model

The mathematical analog of a physical signaling system is shown.The mathematical analog of a physical signaling system is shown.

Problem: two different input sequences may give rise Problem: two different input sequences may give rise to the same output sequence; to the same output sequence; the inputs are the inputs are confusableconfusable. .

We show that we can choose a “nonconfusable” subset We show that we can choose a “nonconfusable” subset of input sequences so that with high probability there of input sequences so that with high probability there is only one highly likely input that could have caused is only one highly likely input that could have caused the particular output. the particular output.

DefinitionsDefinitions

DefinitionDefinition discrete channeldiscrete channel ::((, p(y, p(y||x), x), YY)) a system consisting of an input alphabet a system consisting of an input alphabet and output alphabet and output alphabet YY (finite sets)(finite sets) and a probability transition matrix and a probability transition matrix p(yp(y||x)x) that expresses the probability of that expresses the probability of observing the output symbol observing the output symbol yy given that we send the symbol given that we send the symbol xx. . The channel is said to be The channel is said to be memorylessmemoryless if the probability distribution of the if the probability distribution of the

outputoutputdepends only on the input at that time and is conditionally independent ofdepends only on the input at that time and is conditionally independent ofprevious channel inputs or outputs.previous channel inputs or outputs.

DefinitionDefinition “information” channel capacity “information” channel capacity of a discrete of a discrete memoryless channel:memoryless channel:

Examples Of Channel CapacityExamples Of Channel Capacity

Noiseless Binary ChannelNoiseless Binary ChannelAny transmitted bit is received Any transmitted bit is received without errorwithout error

One error-free bit can beOne error-free bit can be transmitted per use of the transmitted per use of the

channel,channel, so the capacity is 1 bit. so the capacity is 1 bit.

Or, by the definition of Or, by the definition of C C ::C C = max = max I (XI (X; ; Y) Y) =1=1 bit, bit,achieved by:achieved by: p(x) = (1/2, 1/2)p(x) = (1/2, 1/2)..

Examples Of Channel CapacityExamples Of Channel CapacityNoisy channel with Noisy channel with nonoverlapping outputsnonoverlapping outputs

The input can be determined The input can be determined from the outputfrom the output

C C = max = max I (XI (X; ; Y) Y) = 1 bit, = 1 bit,

achieved by :achieved by : p(x) = (1/2, 1/2)p(x) = (1/2, 1/2)..

Examples Of Channel CapacityExamples Of Channel CapacityNoisy TypewriterNoisy Typewriter the channel input is either received unchanged the channel input is either received unchanged

at the output with probability 1/2 or is at the output with probability 1/2 or is transformed into the next letter with transformed into the next letter with probability 1/2probability 1/2

C C = max = max I (XI (X; ; Y) Y) = max = max (H(Y ) (H(Y ) − − H(YH(Y||X)) X)) = max= maxH(Y) H(Y) − 1 = log 26 − − 1 = log 26 −

1 1 = log 13, = log 13, achieved by using achieved by using p(x) p(x) distributed uniformly overdistributed uniformly overall the inputs.all the inputs.


Binary Symmetric ChannelBinary Symmetric ChannelThis is a model of a channel with This is a model of a channel with errors, all the bits received are errors, all the bits received are unreliable.unreliable.

Equality is achieved when the Equality is achieved when the input distribution is uniform. input distribution is uniform. Hence, theHence, theinformation capacity of a binary information capacity of a binary symmetric channel with symmetric channel with parameter parameter p p is is C C = 1 − = 1 − H(p) H(p) bitsbits..


Binary Erasure ChannelBinary Erasure ChannelA fraction A fraction α α of the bits are of the bits are erased. The receiver knows erased. The receiver knows which bits have been erased.which bits have been erased.

[Han Vinck; Essen:][Han Vinck; Essen:]

P(X=0) = PP(X=0) = P00

I(X;Y) = H(X) – H(X|Y)I(X;Y) = H(X) – H(X|Y)

H(X) = H(PH(X) = H(P0 0 ))

H(X|Y) = H(X|Y) = αα H(X) = H(X) = α α H(PH(P00))

Thus, CThus, Cerasureerasure = 1 – = 1 – αα

Properties Of Channel Properties Of Channel CapacityCapacity

1. 1. C C ≥ 0 since ≥ 0 since I (XI (X; ; Y) Y) ≥ 0.≥ 0.2. 2. C C ≤ log | ≤ log | | since | since C C = max = max I (XI (X; ; Y) Y) ≤ max≤ maxH(X) H(X) = log | = log | |.|.3. 3. C C ≤ log | ≤ log | YY | for the same reason.| for the same reason.4. 4. I (XI (X; ; Y) Y) is a continuous function of is a continuous function of p(x)p(x)..5. 5. I (XI (X; ; Y) Y) is a concave function of is a concave function of p(x) p(x) (Theorem 2.7.4). So(Theorem 2.7.4). Soa local maximum is a global maximum. From properties 2 a local maximum is a global maximum. From properties 2

and 3, and 3, the maximum is finite, and we are justified in using the maximum is finite, and we are justified in using the term the term maximummaximum..

PREVIEW OF PREVIEW OF THE THEOREMTHE THEOREM

((AEPAEP again!) again!)

For large block lengths, every channel looks like the For large block lengths, every channel looks like the noisy typewriter channel and noisy typewriter channel and the channel has a the channel has a

subsetsubsetof inputs that produce essentially disjoint sequences of inputs that produce essentially disjoint sequences

atatthe output. the output. For each For each (typical)(typical) input input nn-sequence-sequence, there are, there areapproximately approximately 22nH(Y nH(Y ||X)X) possible possible Y Y sequences, sequences, all of them equally likely . We wish to ensure that noall of them equally likely . We wish to ensure that notwo two X X sequences produce the same sequences produce the same Y Y output output

sequence. sequence. Otherwise, we will not be able to decide which Otherwise, we will not be able to decide which X X sequence was sent. sequence was sent. The total number of possible The total number of possible (typical) (typical) Y Y sequencessequences is is ≈ ≈ 22nH(Y )nH(Y ).. This set has to be divided into sets of size This set has to be divided into sets of size 22nH(Y nH(Y ||X) X) corresponding to the different input corresponding to the different input X X sequences. The total number of disjoint setssequences. The total number of disjoint sets is less than or equal to is less than or equal to 22n(H(Y )n(H(Y )−−H(YH(Y||X)) X)) = 2= 2nI (XnI (X;;Y)Y).. Hence, we can send at most ≈ 2Hence, we can send at most ≈ 2nI (XnI (X;;Y) Y) distinguishabledistinguishablesequences of length sequences of length nn..

DefinitionsDefinitionsA message A message WW, is drawn from the index set {1, is drawn from the index set {1, , 22, . . . , M, . . . , M}.}.

DefinitionDefinition The The nth extension of the discrete memoryless nth extension of the discrete memoryless channel channel (DMC)(DMC)is the channel is the channel (( nn, p(y, p(yn||xxn), ), YY n)), where, wherep(y p(y k||x x k , y , y k-1) ) = = p(yp(yk||xxk), k ), k = 1= 1, , 22, . . . , n., . . . , n.

without feedback:

DefinitionsDefinitionsDefinition An (M, n) code for the channel (, p(y|x), YY) consists ofthe following:1. An index set {1, 2, . . . , M}.2. An encoding function Xn : {1, 2, . . .,M} → n, yielding

codewordsxn(1), xn(2), . . ., xn(M). The set of codewords is called the

codebook.3. A decoding function g : YY n → {1, 2, . . . , M}.

Definition (Conditional probability of error) Let

be the conditional probability of error given that index i was sent.

Definition The maximal probability of error λ(n) for an (M, n) code is defined as

Definition The (arithmetic) average probability of error Pe(n) for an (M, n) code is defined as



Definition The rate R of an (M, n) code is

Definition A rate R is said to be achievable if there exists a sequence of (2nR, n) codes such that the maximal probability of error λ(n) tends to 0 as n→∞.

We write (M= 2nR, n) codes to mean (2nR, n) codes. This will simplify the notation.

Definition The (operational) capacity of a channel is the supremum of all achievable rates. Thus, rates less than capacity yield arbitrarily small probability of error for sufficiently large block lengths.

Jointly Typical Sequences

Definition The set Aε(n)

of jointly typical sequences {(xn, yn)} with respect to the distribution p(x, y) is the set of n-sequences with empirical entropies ε-close to the true entropies:

Where:


There are about 2nH(X) typical X sequences and about 2nH(Y ) typical Y

sequences. However, since there are only 2nH(X,Y ) jointly typical sequences,

not all pairs of typical Xn and typical Yn are also jointly typical.

The probability that any randomly chosen pair is jointly typical is about

2−nI (X;Y). Hence, we can consider about 2nI (X;Y) such pairs before we are

likely to come across a jointly typical pair. This suggests that there are

about 2nI (X;Y) distinguishable signals Xn.Proof #3:

Channel Coding Theorem Theorem (7.7.1) (Channel coding theorem) For a discrete memoryless channel, all rates below

capacity C are achievable. Specifically, for every rate R < C, there exists a

sequence of (2nR, n) codes with maximum probability of error λ(n) →

0.Conversely, any sequence of (2nR, n) codes with λ(n) → 0

must have R ≤ C.

Proof,Achievability Consider the following:

Proof, Achievability1. A random code C is generated according to p(x).2. The code C is then revealed to both sender and receiver. Both sender and receiver are also assumed to

know the channeltransition matrix p(y|x) for the channel.3. A message W is chosen according to a uniform

distribution

4. The w th codeword Xn(w), is sent over the channel.5. The receiver receives a sequence Yn according to thedistribution

6. The receiver guesses which message was sent.

Proof, AchievabilityThe receiver declares that the index Ŵ was sent if thefollowing conditions are satisfied:

Let E be the event {Ŵ ≠ W}.By the symmetry of the code construction, the

average probability of error averaged over all codes does not

dependon the particular index that was sent.Thus, we can assume without loss of generality

that the message W = 1 was sent.

Proof, AchievabilityDefine the following events:

Where Ei is the event that the ith codeword and Yn arejointly typical.Recall that Yn is the result of sending the first codewordXn(1) over the channel.Then an error occurs in the decoding scheme if either

E1C

occurs, or E2 ∪ E3 ∪ · · · ∪ E2nR occurs. Hence,letting P(E) denote Pr(E|W = 1) (note: these are equal):

Proof, Achievability

Now, by the joint AEP,

The probability that Xn(i) and Yn are jointly typical is 2-

n(I(x;y)-3ε) , (i ≠ 1), Thus:

Proof, Achievability

If n is sufficiently large and R < I(X; Y) − 3ε. Hence, if R < I(X; Y),

we can choose ε and n so that the average probability of error,

averaged over codebooks and codewords, is less than 2ε.To finish the proof, we need a series of reasoning and code

selections.

Proof, The ConverseLemma 7.9.2 Let Yn be the result of passing Xn

through a discrete memoryless channel of capacity C. Then

Proof :

Zero-error Codes To obtain a strong bound, we arbitrarily assume that W is

uniformly distributed over {1, 2, . . . , 2nR}. Thus, H(W) = nR. We

can now write:

Hence, for any zero-error (2nR, n) code, for all n, R ≤ C.

Proof, The Converse

We have to show that any sequence of (2nR, n) codes with λ(n) → 0 must

have R ≤ C. If the maximal probability of error tends to zero, the average

probability Pe(n) of error goes to zero,

Dividing by n, we obtain:

Proof, The Converse

Now letting n→∞, we should have R ≤ C.We can rewrite this as:

if R > C, the probability of error is bounded away

from 0 for sufficiently large n .

channel coding theorem (the most famous in it) channel capacity; problem: finding the maximum number...

Documents