chapter 10 information and forward error correction students

8/13/2019 Chapter 10 Information and Forward Error Correction Students

http://slidepdf.com/reader/full/chapter-10-information-and-forward-error-correction-students 1/66

1/16/2

EE 23451Digital Communications

Chapter 10: Information and Forward Error Correction

Dr. Rami A. WahshehCommunications Engineering Department

2

Chapter 10Information and Forward Error Correction:

• Uncertainty, Information, and Entropy.

• Source-Coding Theorem.

• Lossless Data Compression.

• Discrete Memoryless Channels.

•Channel Capacity and Channel Coding Theorem.

• Capacity of a Gaussian Channel.

• Error Control Coding.

• Linear Block Codes and Convolutional Codes.

• Trellis-Coded Modulation and Turbo Codes.



1/16/2

3

10.1 Introduction

•

The purpose of a communication system is to carry information-bearing baseband signals from one place to another over acommunication channel.

• Information theory deals with mathematical modeling andanalysis of a communication system rather than with physicalsources and physical channels.

• Information theory provides answers to two fundamentalquestions (among others):

1. What is the irreducible complexity below which a signalcannot be compressed?

The answer to this question lie in the entropy of a source.

2. What is the ultimate transmission rate for reliablecommunication over a noisy channel?

The answer to this question lie in the capacity of achannel.

4

• Entropy is defined in terms of the probabilistic behavior of asource of information.

• Capacity is defined as the intrinsic ability of a channel toconvey information. It is naturally related to the noisecharacteristics of the channel.

• If the entropy of the source is less than the capacity of thechannel, then error-free communication over the channel can

be achieved.

• The study of information theory provides the fundamentallimits on the performance of a communication system byspecifying the minimum number of bits per symbol requiredto fully represent the source and the maximum rate at whichinformation transmission can take place over the channel.

10.1 Introduction



1/16/2

5

•

The study of error control-coding provides practical methodsof transmitting information from one end of the system at arate and quality that are acceptable to a user at the otherend.

• The goal of error-control coding is to approach the limitsimposed by information theory but constrained by practicalconsiderations.

• The two key system parameters available to the designer aretransmitted signal power and channel bandwidth. These twoparameters, together with the power spectral density of

receiver noise, determine the signal energy per bit-to-noisepower density ratio Eb/No.

• Eb/No uniquely determines the bit error rate for a particularmodulation scheme.

10.1 Introduction

6

• Practical considerations usually place a limit on the valuethat we can assign to Eb/No.

• We often arrive at a modulation scheme and find that it isnot possible to provide acceptable data quality (i.e., lowenough error rate).

• For a fixed Eb/No, the one practical option available forchanging data quality from problematic to acceptable is to

use error-control coding.

• Another practical motivation for the use of coding is toreduce the required Eb/No for a fixed bit error rate. Thisreduction in Eb/No may, in turn, be exploited to reduce therequired transmitted power or reduce the hardware costs byrequiring a smaller antenna size in the case of radiocommunications.

10.1 Introduction



1/16/2

7

•

Error control for data integrity may be exercised by meansof forward error correction (FEC).

• The FEC encoder in the transmitter accepts message bitsand adds redundancy according to a prescribed rule, therebyproducing encoded data at a higher bit rate.

• The FEC decoder in the receiver exploits the redundancy todecide which message bits were actually transmitted.

• The combined goal of the channel encoder and decoder,working together, is to minimize the effect of channel noise.

•There are many different error-correcting codes (with rootsin diverse mathematical disciplines) that we can use. In thischapter, four FEC strategies will be introduced: block codes,convolutional codes, trellis coded modulation, and Turbocodes.

10.1 Introduction

8

• Forward error correction is not the only method of improvingtransmission quality; another major approach known asautomatic-repeat-request (ARQ) is also widely used forsolving the error-control problem.

• The philosophy of ARQ is quite different from that of FEC.Specifically, ARQ utilizes redundancy for the sole purpose oferror detection. Upon detection, the receiver requests a

repeat transmission, which necessitates the use of a returnpath (feedback channel).

10.1 Introduction



1/16/2

9

•

Suppose that a probabilistic experiment involves the observationof the output emitted by a discrete source during every unit oftime (signaling interval).

• The source output is modeled as a discrete random variable, S,which takes on symbols from a fixed finite alphabet

10.2 Uncertainty, Informationand Entropy

• with probabilities

• This set of probabilities must satisfy the condition

• We assume that the symbols emitted by the source duringsuccessive signaling intervals are statistically independent.

• A source having the properties just described is called adiscrete memoryless source, memoryless in the sense that thesymbol emitted at any time is independent of previous choices.

10

• Can we find a measure of how much information is produced bysuch a source? To answer this question, we note that the ideaof information is closely related to that of uncertainty orsurprise, as described next.

• Consider the event S=sk, describing the emission of symbol sk bythe source with probability pk, as defined by

• If the probability pk=1 and pi=0 for all i≠k, then there is nosurprise and therefore no information when symbol sk isemitted, since we know what the message from the source mustbe.

• If, on the other hand, the source symbols occur with differentprobabilities, and the probability pk is low, then there is moresurprise and therefore information when symbol sk is emitted bythe source than when symbol si, i≠k, with higher probability isemitted.




1/16/2

11

•

Thus, the words uncertainty, surprise, and information are allrelated.

• Before the event S=sk occurs, there is an amount ofuncertainty.

• When the event S=sk occurs there is an amount of surprise.After the occurrence of the event S=sk, there is gain in theamount of information, the essence of which may be viewed asthe resolution of uncertainty.

• The amount of information is related to the inverse of theprobability of occurrence.


12

• We define the amount of information gained after observing theevent S=sk, which occurs with probability pk, as the logarithmicfunction




1/16/2

13

• The definition of the above Equation exhibits the followingimportant properties that are intuitively satisfying:

1.

• Obviously, if we are absolutely certain of the outcome of anevent, even before it occurs, there is no information gained.

2.

• That is to say, the occurrence of an event S=sk either providessome or no information, but never brings about a loss of

information.3.

• That is, the less probable an event is, the more information wegain when it occurs.


14

• The resulting unit of information is called the bit (a contractionof binary digit).

• When pk=1/2, we have I(sk)=1 bit.

•

One bit is the amount of information that we gain when one oftwo possible and equally likely (i.e., equiprobable) events occurs.

• The amount of information I(sk) produced by the source duringan arbitrary signaling interval depends on the symbol sk emittedby the source at that time.

• I(sk) is a discrete random variable that takes on the values (s0),(s1),.. ., (sk-1) with probabilities po, p1,…, pk-1 respectively.




1/16/2

15

•

The mean of I(sk) over the source alphabet ζ is given by


• H(ζ) is called the entropy of a discrete memoryless source withsource alphabet ζ.

• H(ζ) is a measure of the average information content per sourcesymbol.

• H(ζ) depends only on the probabilities of the symbols in thealphabet ζ of the source. Thus, the symbol ζ in H(ζ) is not anargument of a function but rather a label for a source.

16

• Consider a discrete memoryless source whose mathematicalmodel is defined by the following Equations:

Some Properties of Entropy

• The entropy H(ζ) of such a source is bounded as follows:



1/16/2

17

Example 10.1: Entropy ofBinary Memoryless Source

18

Example 10.2: Second Order Extensionof a Discrete Memoryless Source



1/16/2

19

Example 10.2: Second Order Extensionof a Discrete Memoryless Source

20

• An important problem in communications is the efficientrepresentation of data generated by a discrete source. Theprocess by which this representation is accomplished is calledsource encoding.

• The device that performs the representation is called a sourceencoder.

• For the source encoder to be efficient, we require knowledge of

the statistics of the source.• If some source symbols are known to be more probable than

others, then we may exploit this feature in the generation of asource code by assigning short codewords to frequent sourcesymbols, and long codewords to rare source symbols.

• We refer to such a source code as a variable-length code.

10.3 Source Coding Theorem



1/16/2

21

•

The Morse code is an example of a variable-length code.• In the Morse code, the letters of the alphabet and numerals

are encoded into streams of marks and spaces, denoted as dots"." and dashes "-", respectively.

• Since in the English language, the letter E occurs morefrequently than the letter Q for example, the Morse codeencodes E into a single dot "." , the shortest codeword in thecode, and it encodes Q into the longest codeword in the code.

• Our primary interest is in the development of an efficientsource encoder that satisfies two functional requirements:

1. The codewords produced by the encoder are in binary form.

2. The source code is uniquely decodable, so that the originalsource sequence can be reconstructed perfectly from theencoded binary sequence.


22

• The scheme shown in Figure 10.3, depicts a discretememoryless source whose output sk is converted by the sourceencoder into a block of 0s and 1s, denoted by bk.

• We assume that the source has an alphabet with k differentsymbols, and the kth symbol sk occurs with probability pk,k=0,1,…., k-1.

• Let the binary codeword assigned to symbol sk by the encoder

have length Lk, measured in bits. We define the averagecodeword length, , of the source encoder as


L



1/16/2

23

• In physical terms, the parameter represents the average number ofbits per source symbol used in the source encoding process.

• Let Lmin denote the minimum possible value of .

• We then define the coding efficiency of the source encoder as

10.3 Source Coding Theorem L

L

• The source encoder is said to be efficient when η approaches unity.

• But how is the minimum value Lmin determined? The answer to thisfundamental question is embodied in Shannon's first theorem: thesource-coding theorem, which may be stated as follows:

• Given a discrete memoryless source of entropy H(ζ), the averagecodeword length for any distortionless source encoding is bounded as L

24

• A common characteristic of signals generated by physical sources is that, intheir natural form, they contain a significant amount of information that isredundant, the transmission of which is therefore wasteful of primarycommunication resources.

• For efficient signal transmission, the redundant information should beremoved from the signal prior to transmission.

• This operation is ordinarily performed on a signal in digital form, in whichcase we refer to it as lossless data compression.

• The code resulting from such an operation provides a representation of thesource output that is not only efficient in terms of the average number ofbits per symbol but also exact in the sense that the original data can bereconstructed with no loss of information.

• The entropy of the source establishes the fundamental limit on the removalof redundancy from the data. Basically, data compression is achieved byassigning short descriptions to the most frequent outcomes of the sourceoutput and longer descriptions to the less frequent ones.

• A type of source code known as a prefix code is not only decodable but alsooffers the possibility of realizing an average codeword length that can bemade arbitrarily close to the source entropy.

10.4 Lossless Data Compression



1/16/2

25

•

Consider a discrete memoryless source of source alphabet {so,s1,.. ., sk-1 } and source statistics {po, p1,..., pk-1 }.

• The code has to be uniquely decodable. This restriction ensuresthat for each finite sequence of symbols emitted by the source,the corresponding sequence of codewords is different from thesequence of codewords corresponding to any other sourcesequence.

• To define the prefix condition, let the codeword assigned tosource symbol sk be denoted by (mk1,mk2,...,mkn), where theindividual elements mk1,mk2,...,mkn are 0s and 1s, and n is thecode-word length.

• The initial part of the codeword is represented by the elementsmk1,...,mki for some i=<n.

• Any sequence made up of the initial part of the codeword iscalled a prefix of the codeword. A prefix code is defined as acode in which no codeword is the prefix of any other codeword.

Prefix Coding

26

• To illustrate the meaning of a prefix code, consider the threesource codes described in Table 10.2.

• Code 1 is not a prefix code since the bit 0, the codeword fors0, is a prefix of 00, the codeword for s2.

• The bit 1, the codeword for s1, is a prefix of 11, the codewordfor s2.

• Similarly, we may show that code III is not a prefix code, but

code II is.

Prefix Coding



1/16/2

27

•

To decode a sequence of codewords generated from a prefixsource code, the source decoder simply starts at the beginningof the sequence and decodes one codeword at a time.

• It sets up what is equivalent to a decision tree, which is agraphical portrayal of the codewords in the particular sourcecode.

• For example, Figure 10.4 depicts the decision treecorresponding to code II in Table 10.2.

Prefix Coding

28

• The tree has an initial state and four terminalstates corresponding to source symbols s0, s1,s2, and s3.

• The decoder always starts at the initial state.

• The first received bit moves the decoder to theterminal state so if it is 0, or else to a seconddecision point if it is 1.

• In the latter case, the second bit moves the

decoder one step further down the tree, eitherto terminal state s1 if it is 0, or else to a thirddecision point if it is 1, and so on.

• Once each initial terminal state emits its symbol,the decoder is reset to its initial state.

• Note also that each bit in the received encodedsequence is examined only once. For example,the encoded sequence 1011111000 ... is readilydecoded as the source sequence s1s3s2s0s0... .

Prefix Coding



1/16/2

29

•

A prefix code has the important property that it is alwaysuniquely decodable.

• If a prefix code has been constructed for a discretememoryless source with source alphabet {so,s1,...,sk-1 } andsource statistics {po,p1,...,pk-1 } and the codeword for symbol sk has length Lk, k=0,1,...,K-1, then the codeword lengths of thecode satisfy a certain inequality known as the Kraft-McMillaninequality.

• In mathematical terms, we may state that

Prefix Coding

• where the factor 2 refers to the radix (number of symbols) inthe binary alphabet.

• If the codeword lengths of a code for a discrete memorylesssource satisfy the Kraft-McMillan inequality, then a prefix codewith these codeword lengths can be constructed.

30

• Prefix codes are distinguished from other uniquely decodablecodes by the fact that the end of a codeword is alwaysrecognizable. Hence, the decoding of a prefix can beaccomplished as soon as the binary sequence representing asource symbol is fully received. For this reason, prefix codesare also referred to as instantaneous codes.

• Given a discrete memoryless source of entropy H(ζ), theaverage codeword length of a prefix code is bounded as

follows:

Prefix Coding

L

• The left-hand bound of (10.19) is satisfied with equality underthe condition sk is emitted by the source with probability

• where Lk is the length of the codeword assigned to sourcesymbol sk. We then have



1/16/2

31

• Under this condition, the Kraft-McMillan inequality of Eq.(10.18) implies that we can construct a prefix code, such thatthe length of the codeword assigned to source symbol sk is Lk.

• For such a code, the average codeword length is

Prefix Coding

Eq. (10.18)

• and the corresponding entropy of the source is

32

• The basic idea behind Huffman coding is to assign to eachsymbol of an alphabet a sequence of bits roughly equal in lengthto the amount of information conveyed by the symbol inquestion.

• The end result is a source code whose average codeword lengthapproaches the fundamental limit set by the entropy of adiscrete memoryless source, namely, H(ζ).

•The essence of the algorithm used to synthesize the Huffmancode is to replace the prescribed set of source statistics of adiscrete memoryless source with a simpler one.

• This reduction process is continued in a step-by-step manneruntil we are left with a final set of only two source statistics(symbols), for which (0,1) is an optimal code.

• Starting from this trivial code, we then work backward andthereby construct the Huffman code for the given source.

Prefix Code: Huffman Coding



1/16/2

33

• Specifically, the Huffman encoding algorithm proceeds as follows:

1.The source symbols are listed in order of decreasing probability.The two source symbols of lowest probability are assigned a 0 anda 1. This part of the step is referred to as a splitting stage.

2. These two source symbols are regarded as being combined into anew source symbol with probability equal to the sum of the twooriginal probabilities. (The list of source symbols, and thereforesource statistics, is thereby reduced in size by one.) Theprobability of the new symbol is placed in the list in accordancewith its value.

3. The procedure is repeated until we are left with a final list of

source statistics (symbols) of only two, for which a 0 and a 1 areassigned.

• The code for each (original) source symbol is found by workingbackward and tracing the sequence of 0s and 1s assigned to thatsymbol as well as its successors.

Prefix Code: Huffman Coding

34

Example 10.3 HuffmanAlgorithm



1/16/2

37

•

We may proceed by placing the probability of the new symbol ashigh as possible, as in Example 10.3. Alternatively, we mayplace it as low as possible.

• (It is presumed that whichever way the placement is made, highor low, it is consistently adhered to throughout the encodingprocess.) But this time, noticeable differences arise in that thecodewords in the resulting source code can have differentlengths. Nevertheless, the average codeword length remains thesame.

• As a measure of the variability in codeword lengths of a sourcecode, we define the variance of the average codeword length Z

over the ensemble of source symbols as

Huffman Coding

• where po, p1,..., pk-1 are the source statistics, and Lk is thelength of the codeword assigned to source symbol sk..

38

Huffman Coding• It is usually found that when a combined symbol is moved as

high as possible, the resulting Huffman code has a significantlysmaller variance σ 2 than when it is moved as low as possible. Onthis basis, it is reasonable to choose the former Huffman codeover the latter.

• In Example 10.3, a combined symbol was moved as high aspossible. In Example 10.4, presented next, a combined symbolis moved as low as possible. Thus, by comparing the results of

these two examples, we are able to appreciate the subtledifferences and similarities between the two Huffman codes.



1/16/2

39

Example 10.4 NonuniquenessHuffman Algorithm

40

Example 10.4 NonuniquenessHuffman Algorithm



1/16/2

41

Huffman Coding

42

• Up to this point in the chapter, we have been preoccupied withdiscrete memoryless sources responsible for information generation.We next consider the issue of information transmission, withparticular emphasis on reliability. We start the discussion byconsidering a discrete memoryless channel, the counterpart of adiscrete memoryless source.

• A discrete memoryless channel is a statistical model with an inputX and an output Y that is a noisy version of X; both X and Y are

random variables.• Every unit of time, the channel accepts an input symbol X selected

from an alphabet X and, in response, it emits an output symbol Yfrom an alphabet Y.

• The channel is said to be discrete when both of the alphabets Xand Y have finite sizes.

• It is said to be memoryless when the current output symboldepends only on the current input symbol and not any of theprevious ones.

10.6 Discrete MemorylessChannels



1/16/2

43

•

Figure 10.8 depicts a view of a discrete memoryless channel.The channel is described in terms of an input alphabet


• And output alphabet,

• And a set of transition probabilities

44

• Also, the input alphabet X and output alphabet Y need not havethe same size.

• For example, in channel coding, the size K of the outputalphabet Y may be larger than the size J of the input alphabetX; thus, K>=J.

• On the other hand, we may have a situation in which thechannel emits the same symbol when either one of two input

symbols is sent, in which case we have K<=J.• A convenient way of describing a discrete memoryless channel is

to arrange the various transition probabilities of the channel inthe form of a matrix as follows:


• The J-by-K matrix P is called the channel matrix.



1/16/2

45

• Note that each row of the channel matrix P corresponds to afixed channel input, whereas each column of the matrixcorresponds to a fixed channel output.

• A fundamental property of the channel matrix P, as definedhere, is that the sum of the elements along any row of thematrix is always equal to one; that is,


• Suppose now that the inputs to a discrete memoryless channelare selected according to the probability distribution {p(x j),

j=0,1,...,J-1}. In other words, the event that the channelinput X=x j occurs with probability

46

• Having specified the random variable X denoting the channelinput, we may now specify the second random variable Ydenoting the channel output.

• The joint probability distribution of the random variables X and Y is given by


• The marginal probability distribution of the output randomvariable Y is obtained by averaging out the dependence ofp(x j,yk) on x j, as shown by



1/16/2

47

Example 10.5 Binary SymmetricChannel

48

• Of practical interest in many communication applications is thenumber of bits that may be reliably transmitted per secondthrough a given communications channel.

• Given that we think of the channel output Y (selected fromalphabet Y) as a noisy version of the channel input X (selectedfrom alphabet X).

• The entropy H(ζ) is a measure of the prior uncertainty about X.

• How can we measure the uncertainty about X after observing Y?To answer this question, we extend the ideas developed inSection 10.2 by defining the conditional entropy of X selectedfrom alphabet X, given that Y=Yk.

10.7 Channel Capacity



1/16/2

49


50

• The entropy H(X) represents our uncertainty about the channelinput before observing the channel output.

• The conditional entropy H(X/ Y) represents our uncertainty aboutthe channel input after observing the channel output.

• The difference H(X)-H(X/ Y) must represent our uncertaintyabout the channel input that is resolved by observing thechannel output.

• This important quantity is called the mutual information of thechannel.

• Denoting the mutual information of the channel by I(X; Y), wemay thus write




1/16/2

51

•

Mutual information has a number of properties:


52

• The relationship between the source entropy H(X), theconditional entropy H(X/ Y), and the mutual information I(X; Y) isillustrated conceptually in Figure 10.10.




1/16/2

53

•

Consider a discrete memoryless channel with input alphabet X,output alphabet Y, and transition probabilities p(yk│x j).


• It is necessary for us to know the input probability distribution {p(x j)│j=0,1,...,J-1} so that we may calculate the mutualinformation I(X; Y).

• The mutual information of a channel therefore depends not onlyon the channel but also on the way in which the channel is used.

• The input probability distribution {p(x j)} is obviously independentof the channel.

• We can then maximize the average mutual information I(X;Y) ofthe channel with respect to {p(x j)}.

54

• The channel capacity of a discrete memoryless channel isdefined as the maximum average mutual information I(X; Y) inany single use of the channel (i.e., signaling interval), where themaximization is over all possible input probability distributions

{p(x j)} on X.

• The channel capacity is commonly denoted by C.


• The channel capacity C is measured in bits per channel use.

• Note that the channel capacity C is a function only of thetransition probabilities p(yklx j), which define the channel.

• The calculation of C involves maximization of the average mutualinformation I(X; Y) over J variables [i.e., the input probabilitiesp(xo),..., p(XJ-1)] subject to two constraints:



1/16/2

55

Example 10.6 Binary SymmetricChannel

56

• The channel capacity C varies with the probability of error(transition probability) p as shown in Figure 10.11, which issymmetric about p=1/2.

• Comparing the curve in this figure with that in Figure 10.2, wemay make the following observations:




1/16/2

57

•

The inevitable presence of noise in a channel causesdiscrepancies (errors) between the output and input datasequences of a digital communication system.

• For a relatively noisy channel, the probability of error may havea value higher than 10-2, which means that less than 99 out of100 transmitted bits are received correctly.

• A probability of error equal to 10-6 or even lower is often anecessary requirement. To achieve such a high level ofperformance, we may have to resort to the use of channelcoding.

•

The design goal of channel coding is to increase the resistanceof a digital communication system to channel noise.

• Channel coding consists of mapping the incoming data sequenceinto a channel input sequence, and inverse mapping the channeloutput sequence into an output data sequence in such a way thatthe overall effect of channel noise on the system is minimized.

10.8 Channel Coding Theorem

58

• The source encoding (before channel encoding) and source decoding(after channel decoding) are not included in Figure 10.12.

• The channel encoder and channel decoder in Figure 10.12 are bothunder the designer's control and should be designed to optimize theoverall effectiveness of the communication system.

• The approach taken is to introduce redundancy in the channel encoderso as to reconstruct the original source sequence as accurately aspossible.

• We may view channel coding as the dual of source coding in that theformer introduces controlled redundancy to improve reliability, whereasthe latter reduces redundancy to improve efficiency.




1/16/2

59

•

For the purpose of our present discussion of channel coding, itsuffices to confine our attention to block codes. In this class ofcodes, the message sequence is subdivided into sequential blockseach k bits long, and each k-bit block is mapped into an n-bitblock, where n>k.

• The number of redundant bits added by the encoder to eachtransmitted block is n-k bits.

• The ratio k/n is called the code rate. Using r to denote thecode rate, then


• The accurate reconstruction of the original source sequence atthe destination requires that the average probability of symbolerror be arbitrarily low.

60

• This raises the following important question: Does there exist asophisticated channel coding scheme such that the probabilitythat a message bit will be in error is less than any positivenumber ε (i.e. , as small as we want it), and yet the channelcoding scheme is efficient in that the code rate need not be toosmall? The answer to this fundamental question is an emphatic"yes."

• The answer to the question is provided by Shannon's second

theorem in terms of the channel capacity C. The theoremspecifies the channel capacity C as a fundamental limit on therate at which the transmission of reliable error-free messagescan take place over a discrete memoryless channel.

• Up until this point, time has not played an important role in ourdiscussion of channel capacity. Suppose then the discretememoryless source in Figure 10.12 has the source alphabet ζ and entropy H(ζ) bits per source symbol.




1/16/2

61

•

Assume that the source emits symbols once every T s seconds.• The average information rate of the source is H(ζ)/T s, bits per

second.

• The decoder delivers decoded symbols to the destination fromthe source alphabet ζ and at the same source rate of onesymbol every T s seconds.

• The discrete memoryless channel has a channel capacity equal toC bits per use of the channel.

• We assume that the channel is capable of being used once everyT C seconds.

• The channel capacity per unit time is C/T C bits per second,which represents the maximum rate of information transfer overthe channel.


62

• Shannon's second theorem, known as the channel codingtheorem.

• The channel coding theorem for a discrete memoryless channelis stated in two parts as follows.




1/16/2

63

• Consider a discrete memoryless source that emits equally likely binarysymbols (0s and 1s) once every T s seconds.

• The source entropy is equal to one bit per source symbol (see Example10.1), the information rate of the source is (1/T s) bits per second.

• The source sequence is applied to a channel encoder with code rate r.

• The channel encoder produces a symbol once every T C seconds.

• The encoded symbol transmission rate is (1/T C) symbols per second.

• The channel encoder engages a binary symmetric channel once every T C seconds.

• The channel capacity per unit time is (C/T C) bits per second, where C

is determined by the prescribed channel transition probability p inaccordance with the following equation.

• The channel coding theorem [part (i)] implies that if

Application of the Channel CodingTheorem to Binary Symmetric Channels

64

• The probability of error can be made arbitrarily low by the useof a suitable channel encoding scheme.

• But the ratio T C/T s, equals the code rate of the channelencoder:


• That is, for r<=C, there exists a code (with code rate less thanor equal to C) capable of achieving an arbitrarily low probabilityof error.



1/16/2

65

Example 10.7 Repetition Code

66




1/16/2

67


68

• The idea of average mutual information is used to formulate theinformation capacity theorem for band-limited, power-limitedGaussian channels.

• Consider a zero-mean stationary process X(t) that is band-limited to B Hertz.

• Let Xk, k=1,2,...,K, denote the continuous random variablesobtained by uniform sampling of the process X(t) at the Nyquist

rate of 2B samples per second.• These samples are transmitted in T seconds over a noisy

channel, also band-limited to B Hertz.

• The number of samples, K, is given by

10.9 Capacity of a GaussianChannel

• We refer to Xk as a sample of the transmitted signal.

• The channel output is perturbed by additive white Gaussiannoise of zero mean and power spectral density No/2. The noiseis band-limited to B Hertz.



1/16/2

69

•

Let the continuous random variables Yk, k=1,2,...,K denotesamples of the received signal, as shown by


• The noise sample Nk is Gaussian with zero mean and variancegiven by

• We assume that the samples Yk, k=1,2,...,K are statisticallyindependent.

• A channel for which the noise and the received signal are asdescribed in the above two equations is called a discrete-time,memoryless Gaussian channel (modeled as in Figure 10.14).

• The transmitter is power limited; it is therefore reasonable todefine the constraint as

70

• The power-limited Gaussian channel described herein is of notonly theoretical but also practical importance in that it modelsmany communication channels, including radio and satellite links.

• The information capacity of the channel is defined as themaximum of the mutual information between the channel inputXk and the channel output Yk over all distributions on the inputXk that satisfy the power constraint E[X2

k]=P.

•Let I(Xk;Yk) denote the average mutual information between Xk and Yk. We may then define the information capacity of thechannel as

• where the maximization is performed with respect to fXk(x), theprobability density function of Xk.




1/16/2

71

•

Performing this optimization is beyond the scope of this textbut the result is


• With the channel used K times for the transmission of Ksamples of the process X(t) in T seconds, we find that theinformation capacity per unit time is (K/T) times the resultgiven in the above equation.

• The number K equals 2BT. Accordingly, we may express theinformation capacity per transmission as

72

• Shannon's third (and most famous) theorem, the information capacity theoremstates that


• The theorem implies that, for given average transmitted power P and channelbandwidth B, we can transmit information at the rate C bits per second, asdefined in the above equation, with arbitrarily small probability of error byemploying sufficiently complex encoding systems.

• It is not possible to transmit at a rate higher than C bits per second by anyencoding system without a definite probability of error.

• The channel capacity theorem defines the fundamental limit on the rate oferror-free transmission for a power-limited, band-limited Gaussian channel.

• To approach this limit, the transmitted signal must have statistical propertiesapproximating those of white Gaussian noise.



1/16/2

73

•

We introduce the notion of an ideal system defined as one thattransmits data at a bit rate Rb equal to the informationcapacity C.

• The average transmitted power may be expressed as


• where Eb is the transmitted energy per bit.

• The ideal system is defined by the equation

• The signal energy-per-bit to noise power spectral density ratioEb/No may be defined in terms of the bandwidth efficiency C/Bfor the ideal system as

74

• A plot of bandwidth efficiency Rb/B versus Eb/No is called the bandwidth-efficiency diagram.

• A generic form of this diagram is displayed in Figure 10.15, where the curvelabeled capacity boundary corresponds to the ideal system for which Rb=C.

• Based on Figure 10.15, we can make the following observations:




1/16/2

75


76

Example 10.8 M-ary PSK andM-ary FSK



1/16/2

77


78




1/16/2

79

•

The channel coding theorem states that if a discretememoryless channel has capacity C and a source generatesinformation at a rate less than C, then there exists a codingtechnique such that the output of the source may betransmitted over the channel with an arbitrarily low probabilityof error.

• The channel coding theorem specifies the channel capacity Casefundamental limit on the rate at which the transmission ofreliable (error-free) messages can take place over a discretememoryless channel.

• The issue that matters is not the signal-to-noise ratio, so long

as it is large enough, but how the channel input is encoded.

• Figure 10.17a shows one model of how this encoding (andcorresponding decoding) could be included in a digitalcommunication system, an approach known as forward errorcorrection (FEC).

/convo.htm web.mit.edu/6.02/www/f2010/h

andouts/ lectures/L8.pdf

80

• The discrete source generates information in the form of binary symbols.

• The channel encoder in the transmitter accepts message bits and addsredundancy according to a prescribed rule, thereby producing encodeddata at a higher bit rate.

• The channel decoder in the receiver exploits the redundancy to decidewhich message bits were actually transmitted.

• The combined goal of the channel encoder and decoder is to minimize theeffect of channel noise. That is, the number of errors between the

channel encoder input (derived from the source) and the channel decoderoutput (delivered to the user) is minimized.

• The addition of redundancy in the coded messages implies the need forincreased transmission bandwidth.

• The use of error-control coding adds complexity to the system, especiallyfor the implementation of decoding operations in the receiver.

• The design trade-offs in the use of error-control coding to achieveacceptable error performance include considerations of bandwidth andsystem complexity.

10.10 Error Control Coding

http://www.complextoreal.com/convo.htm





1/16/2

81

• In the model depicted in Figure 10.17a, the operations of channel coding andmodulation are performed separately. When, however, bandwidth efficiencyis of major concern, the most effective method of implementing forwarderror-control correction coding is to combine it with modulation as a singlefunction, as shown in Figure 10.17b. In such an approach, coding is redefinedas a process of imposing certain patterns on the transmitted signal.


Binarysymbols

Adds redundancy: Encodeddata at a higher bit rate

The combined goal of the channelencoder and decoder is to minimizethe effect of channel noise.

The addition of redundancy in the coded messagesimplies the need for increased transmission

bandwidth, and adds complexity to the system.

82

• The most unsatisfactory feature of the channel coding theorem, isits nonconstructive nature.

• The channel coding theorem asserts the existence of good codes butdoes not tell us how to find them.

• The error-control coding techniques provide different methods ofachieving this important system requirement.

• The waveform channel is said to be memoryless if the detector

output in a given interval depends only on the signal transmitted inthat interval, and not on any previous transmission.

• Under this condition, we may model the combination of themodulator, the waveform channel, and the detector as a discretementoryless channel.

• The discrete memoryless channel is completely described by the setof transition probabilities p(j/i), where i denotes a modulator inputsymbol, j denotes a demodulator output symbol, and p(j/i) denotesthe probability of receiving symbol j, given that symbol i was sent.




1/16/2

83

• The simplest discrete memoryless channel results from the use ofbinary input and binary output symbols.

• When binary coding is used, the modulator has only the binary symbols0 and 1 as inputs.

• Likewise, the decoder has only binary inputs if binary quantization ofthe demodulator output is used, that is, a hard decision is made on thedemodulator output as to which symbol was actually transmitted.

• In this situation, we have a binary symmetric channel (BSC) with atransition probability diagram as shown in Figure 10.18.

• The binary symmetric channel, assuming a channel noise modeled asadditive white Gaussian noise (AWGN) channel, is completely described

by the transition probability p.


84

• The use of hard decisions prior to decoding causes an irreversible loss ofinformation in the receiver.

• To reduce this loss, soft-decision coding is used.

• This is achieved by including a multilevel quantizer at the demodulator output, asillustrated in Figure 10.19 for the case of binary PSK signals.

• The input-output characteristic of the quantizer is shown in Figure 10.20a.

• The modulator has only the binary symbols 0 and 1 as inputs, but the demodulatoroutput now has an alphabet with Q symbols.

• Assuming the use of the quantizer as described in Figure 10.20a, we have Q=8.Such a channel is called a binary input Q-ary output discrete memoryless channel.




1/16/2

85

• The corresponding channel transition probability diagram is shown in Figure 10.20b.

• The form of this distribution, and consequently the decoder performance, depends on the locationof the representation levels of the quantizer, which in turn depends on the signal level and noisevariance.

• The demodulator must incorporate automatic gain control if an effective multilevel quantizer is tobe realized.

• The use of soft decisions complicates the implementation of the decoder. Nevertheless, soft-decision decoding offers significant improvement in performance over hard-decision decoding.


Soft decision decoding requires a stream of 'soft bits'where we get not only the 1 or 0 decision but also anindication of how certain we are that the decision is correct.

000 (definitely 0), 001 (probably 0), 010 (maybe 0), 011 (guess 0),

100 (guess 1), 101 (maybe 1), 110 (probably 1) , 111(definitely 1).

86

• A code is said to be linear if any two codewords in the code can beadded in modulo-2 arithmetic to produce a third codeword in thecode.

10.11 Linear Block Codes



1/16/2

87

• Modulo 2 addition is performed using an exclusive OR (xor)operation on the corresponding binary digits of each operand. Thefollowing table describes the xor operation:


88

• A code is said to be linear if any two codewords in the code can beadded in modulo-2 arithmetic to produce a third codeword in thecode.

• Consider an (n,k) linear block code in which k bits of the n codebits are always identical to the message sequence to betransmitted.

• The n-k bits in the remaining portion are computed from themessage bits in accordance with a prescribed encoding rule that

determines the mathematical structure of the code.

• These n-k bits are referred to as generalized parity check bits orsimply parity bits.

• Block codes in which the message bits are transmitted in unalteredform are called systematic codes.

• For applications requiring both error detection and errorcorrection, the use of systematic block codes simplifiesimplementation of the decoder.




1/16/2

89

• Let m0

,m1

,...,mk-1

constitute a block of k arbitrary message bits.

• We have 2k distinct message blocks.

• Let this sequence of message bits be applied to a linear block encoder,producing an n-bit codeword whose elements are denoted byC0,C1,...,Cn-1.

• Let b0,b1,..., bn-k-1 denote the (n-k) parity bits in the codeword.

• For the code to possess a systematic structure, a codeword is dividedinto two parts, one of which is occupied by the message bits and theother by the parity bits.

• We have the option of sending the message bits of a codeword beforethe parity bits, or vice versa.

• The former option is illustrated in Figure 10.21, and its use is assumedin the sequel.


90

• According to the representation of Figure 10.21, the (n-k)left-most bits of a codeword are identical to the correspondingparity bits, and the k right-most bits of the codeword areidentical to the corresponding message bits.


• The (n-k) parity bits are linear sums of the k message bits, asshown by the generalized relation where + refers to modulo-2addition

• where the coefficients are defined as follows:

• The coefficients pij are chosen in such a way that the rows ofthe generator matrix are linearly independent and the parityequations are unique.



1/16/2

91

•

The system of Eqs. (10.62) and (10.63) defines themathematical structure of the (n,k) linear block code.

• This system of equations may be rewritten in a compact formusing matrix notation. To proceed with this reformulation, wedefine the 1-by-k message vector m, the 1-by-(n-k) parityvector b, and the 1-by-n code vector c as follows:


• All three vectors are row vectors.

• The use of row vectors is adopted in this chapter for the sakeof being consistent with the notation commonly used in thecoding literature.

1-by-k message vector

1-by-(n-k) parity vector

1-by-n code vector

92

• We may thus rewrite the set of simultaneous equations definingthe parity bits in the compact matrix form:


• where P is the k-by-(n-k) coefficient matrix defined by

• Where pij is 0 or 1.

• From the definitions given in the Equations in slides 89-90, Cmay be expressed as a partitioned row vector in terms of thevectors m and b as follows :



1/16/2

93

• where Ik is the k-by-k identity matrix:


• Define the k-by-n generator matrix

• The generator matrix G of the above equation is said to be inthe echelon canonical form in that its k rows are linearlyindependent; that is, it is not possible to express any row ofthe matrix G as a linear combination of the remaining rows.

P is the k-by-(n-k) coefficient matrix

partitioned row vector in terms of

94

• The full set of codewords, referred to simply as the code, isgenerated in accordance with the above equation by letting themessage vector m range through the set of all 2k binary k-tuples(1-by-k vectors) [a tuple is an ordered list of elements].

• The sum of any two codewords is another codeword. This basicproperty of linear block codes is called closure.

•To prove the closure’s validity, consider a pair of code vectors c i and c j corresponding to a pair of message vectors mi and m j,respectively.

• Using the above equation, we may express the sum of c i and c j as


• The modulo-2 sum of mi and m j represents a new message vector.Correspondingly, the modulo-2 sum of ci and c j represents a newcode vector.



1/16/2

95

•

There is another way of expressing the relationship between themessage bits and parity-check bits of a linear block code. LetH denote an (n-k)-by-n matrix, defined as


• where PT is an (n-k)-by-k matrix, representing the transpose ofthe coefficient matrix P, and In-k is the (n-k)-by-(n-k) identitymatrix.

• We may perform the following multiplication of partitionedmatrices:

• where we have used the fact that multiplication of arectangular matrix by an identity matrix of compatibledimensions leaves the matrix unchanged.

partitioned row vector in terms of

96

• In modulo-2 arithmetic, we have PT +PT =0, where 0 denotes an(n-k)-by-k null matrix (i.e., a matrix that has zeros for all ofits elements).


• Equivalently, we have GHT =0.

• The matrix H is called the parity-check matrix of the code,and the set of equations specified by the above equation arecalled parity-check equations.

• The generator equation (10.74) and the parity-check detectorequation (10.77) are basic to the description and operation of alinear block code. These two equations are depicted in the formof block diagrams in Figure 10.22a and b, respectively.

10.74 10.77



1/16/2

97


98

Example 10.9 Repetition Codes



1/16/2

99

•

The generator matrix G is used in the encoding operation at thetransmitter.

• On the other hand, the parity-check matrix H is used in thedecoding operation at the receiver.

• let r denote the 1-by-n received vector that results fromsending the code vector c over a noisy channel.

• We express the vector r as the sum of the original code vectorc and a vector e, as shown by

Syndrome Encoding-I

• The vector e is called the error vector or error pattern.

• The ith element of e equals 0 if the corresponding element of r is the same as that of c.

• On the other hand, the ith element of e equals 1 if thecorresponding element of r is different from that of c, in whichcase an error is said to have occurred in the ith location.

100

• That is, for i=1,2,...,n, we have

Syndrome Encoding-I

• The receiver has the task of decoding the code vector c fromthe received vector r.

• The algorithm commonly used to perform this decoding operation

starts with the computation of a 1-by-(n-k) vector called theerror-syndrome vector or simply the syndrome.

• The importance of the syndrome lies in the fact that it dependsonly upon the error pattern.

• Given a 1-by-n received vector r, the corresponding syndromeis formally defined as



1/16/2

101

•

The syndrome has the following important properties:

Syndrome Encoding-I

102

Syndrome Encoding-I



1/16/2

103

• We may put Properties 1 and 2 in perspective by expanding Eq.(10.81). Specifically, with the matrix H having the systematicform given in Eq. (10.75), where the matrix P is itself definedby Eq. (10.69), we find from Eq. (10.81) that the (n-k)elements of the syndrome s are linear combinations of the n elements of the error pattern e as shown by

Syndrome Encoding-I

Eq. (10.81)

Eq. (10.75)

Eq. (10.69)

Eq. (10.84)

104

• This set of (n-k) linear equations clearly shows that the syndromecontains information about the error pattern and may therefore beused for error detection.

• It should be noted that the set of equations is underdetermined inthat we have more unknowns than equations.

• Accordingly, there is no unique solution for the error pattern.Rather there are 2k error patterns that satisfy Eq. (10.84) andtherefore result in the same syndrome, in accordance with

Property 2 and Eq. (10.83); the true error pattern is just one ofthe 2k possible solutions.

• In other words, the information contained in the syndrome s aboutthe error pattern s is not enough for the decoder to compute theexact value of the transmitted code vector.

• Knowledge of the syndrome s reduces the search for the trueerror pattern e from 2n to 2n-k possibilities. In particular, thedecoder has the task of making the best selection from the cosetcorresponding to s.

Syndrome Encoding-I



1/16/2

105

•

Consider a pair of code vectors c1 and c2 that have the samenumber of elements.

• The Hamming distance d(c1,c2) between such a pair of codevectors is defined as the number of locations in which theirrespective elements differ.

• The Hamming weight w(c) of a code vector c is defined as thenumber of nonzero elements in the code vector.

• Equivalently, we may state that the Hamming weight of a codevector is the distance between the code vector and the all-zerocode vector.

• The minimum distance dmin of a linear block code is defined asthe smallest Hamming distance between any pair of code vectorsin the code. That is, the minimum distance is the same as thesmallest Hamming weight of the difference between any pair ofcode vectors, expressed in terms of its columns as follows :

Minimum DistanceConsiderations

106

• From the closure property of linear block codes, the sum (ordifference) of two code vectors is another code vector.

• Accordingly, we may state that the minimum distance of a linearblock code is the smallest Hamming weight of the nonzero codevectors in the code.

• The minimum distance dmin is related to the structure of theparity-check matrix H of the code in a fundamental way.

• From Eq. (10.77) we know that a linear block code is defined bythe set of all code vectors for which cHT =0, where HT is thetranspose of the parity-check matrix H.

• Let the matrix H be expressed in terms of its columns asfollows:




1/16/2

107

•

For a code vector c to satisfy the condition cH

T

=0, the vectorc must have 1s in such positions that the corresponding rows ofHT sum to the zero vector 0.

• By definition, the number of 1s in a code vector is the Hammingweight of the code vector.

• Moreover, the smallest Hamming weight of the non zero codevectors in a linear block code equals the minimum distance ofthe code.

• The minimum distance of a linear block code is defined by theminimum number of rows of the matrix HT whose sum is equal to

the zero vector.


108

• The minimum distance of a linear block code, dmin, is an importantparameter of the code. It determines the error-correctingcapability of the code.

• Suppose an (n,k) linear block code is required to detect and correctall error patterns (over a binary symmetric channel), whoseHamming weight is less than or equal to t. That is, if a codevector ci, in the code is transmitted and the received vector isr=ci+e, we require that the decoder output c^=c i, whenever the

error pattern e has a Hamming weight w(e)≤t.• We assume that the 2k code vectors in the code are transmitted

with equal probability.

• The best strategy for the decoder then is to pick the code vectorclosest to the received vector r, that is, the one for which theHamming distance d(ci,r) is the smallest.

• With such a strategy, the decoder will be able to detect andcorrect all error patterns of Hamming weight w(e)≤t, provided thatthe minimum distance of the code is equal to or greater than 2t+1.




1/16/2

109

• We may demonstrate the validity of this requirement by adopting a geometric

interpretation of the problem. In particular, the 1-by-n code vectors and the1-by-n received vector are represented as points in an n-dimensional space.

• Suppose that we construct two spheres, each of radius t, around the pointsthat represent code vectors ci and c j.

• Let these two spheres be disjoint, as depicted in Figure 10.23a. For thiscondition to be satisfied we require that d(ci,c j)≥2t+1.

• If then the code vector c, is transmitted and the Hamming distance d(ci,r)≤t,it is clear that the decoder will pick c i as it is the code vector closest to thereceived vector r.

• If, on the other hand , the Hamming distance d(ci,c j)≤2t, the two spheresaround ci and c j intersect, as depicted in Figure 10.23b.


110

• Here we see that if ci, is transmitted, there exists a received vector r suchthat the Hamming distance d(ci,r)≤t, and yet r is as close to c j as it is to ci.

• Clearly, there is now the possibility of the decoder picking the vector c j, whichis wrong.

• We thus conclude that an (n,k) linear block code has the power to correct allerror patterns of weight t or less if, and only if,


• By definition, however, the smallest distance between any pair of code vectorsin a code is the minimum distance of the code, dmin. We may therefore statethat an (n,k) linear block code of minimum distance dmin can correct up to terrors if, and only if,

• where [ ] denotes the largest integer less than or equal to the enclosedquantity.

• The above equation gives the error-correcting capability of a linear block codea quantitative meaning.



1/16/2

111

Example 10.10 Hamming Codes

112




1/16/2

113


114

• The set of linear block codes is large. One important subclass of linear block codesis known as cyclic codes, as they are characterized by the fact that any cyclicshift a codeword is also a codeword. Important examples of cyclic codes are:

• Hamming codes, of which we have already given an example.

• Maximal length codes, which have very good autocorrelation properties and findmany applications outside of forward error correction.

• Cyclic redundancy check (CRC) codes, which add parity bits to a transmission withthe primary purpose of allowing the receiver to reliably, determine if any errorsoccurred in the transmission. Thus , these are error-detection codes.

• Bose-Chaudhuri-Hocquenghem (BCH) codes is a large family of cyclic codes. TheBCH codes offer flexibility in the choice of code parameters, namely, block lengthand code rate.

• Reed-Solomon (RS) codes are an important subclass of nonbinary BCH codes. Theencoder for an RS code differs from a binary encoder in that it operates onmultiple bits rather than individual bits. An RS (n,k) always satisfies the conditionn-k=2t; this property makes the class of RS codes very powerful in an error -correcting sense.

• A detailed treatment of these different cyclic coding techniques is beyond thescope of our present discussion."

Cyclic Codes



1/16/2

115

• In block coding, the encoder accepts a k-bit message block andgenerates an n-bit codeword. Codewords are produced on a block-by-block basis.

• Provision must be made in the encoder to buffer an entire messageblock before generating the associated codeword.

• There are applications, where the message bits come in seriallyrather than in large blocks, in which case the use of a buffer maybe undesirable.

• In such situations, the use of convolutional coding may be thepreferred method.

• A convolutional encoder operates on the incoming message sequence

continuously in a serial manner.

• The encoder of a binary convolutional code with rate 1/n, measuredin bits per symbol, may be viewed as a finite-state machine thatconsists of an M-stage shift register with prescribed connections ton modulo-2 adders, and a multiplexer that serializes the outputs ofthe adders.

10.12 Convolution Codes

116

• An L-bit message sequence produces a coded output sequence oflength n(L+M) bits. The code rate is therefore given by

• Typically, we have L»M. Hence, the code rate simplifies to

• The constraint length of a convolutional code, expressed interms of message bits, is defined as the number of shifts overwhich a single message bit can influence the encoder output.

• In an encoder with an M-stage shift register, the memory ofthe encoder equals M message bits, and K=M+1 shifts arerequired for a message bit to enter the shift register andfinally come out.

• The constraint length of the encoder is K.




1/16/2

117

• Figure 10.26a shows a convolutional encoder with n=2 and K=3.

• The code rate of this encoder is 1/2. The encoder of Figure 10.26a operateson the incoming message sequence, one bit at a time.

• We may generate a binary convolutional code with rate k/n by using k separateshift registers with prescribed connections to n modulo-2 adders, an inputmultiplexer and an output multiplexer.

• An example of such an encoder is shown in Figure 10.26b, where k=2, n=3, andthe two shift registers have K=2 each. The code rate is 2/3.

• In this second example, the encoder processes the incoming message sequencetwo bits at a time.


118

• The convolutional codes generated by the encoders of Figure10.26 are nonsystematic codes.

• Unlike block coding, the use of nonsystematic codes is ordinarilypreferred over systematic codes in convolutional coding.

• Each path connecting the output to the input of a convolutionalencoder may be characterized in terms of its impulse response,defined as the response of that path to a symbol 1 applied to

its input, with each flip-flop in the encoder set initially in thezero state.

• Equivalently, we may characterize each path in terms of agenerator polynomial, defined as the unit-delay transform ofthe impulse response.

• To be specific, let the generator sequence (g0(i), g1

(i),g2

(i),...,gM(i)) denote the impulse response of the ith path, where

the coefficients g0(i), g1

(i), g2(i),...,gM

(i) equal 0 or 1.




1/16/2

119

•

Correspondingly, the generator polynomial of the ith path isdefined by


• where D denotes the unit-delay variable and + corresponds tomodulo 2 addition.

• The complete convolutional encoder is described by the set ofgenerator polynomials {g(1)(D),g(2)(D),...,g(n)(D)}.

• Traditionally, different variables are used for the descriptionof convolutional and cyclic codes, with D being commonly usedfor convolutional codes and X for cyclic codes.

120

• The structural properties of a convolutional encoder may beportrayed in graphical form as a trellis diagram.

• A trellis brings out explicitly the fact that the associatedconvolutional encoder is a finite-state machine.

• We define the state of a convolutional encoder of rate 1/n asthe (K-1) message bits stored in the encoder's shift register.

• At time j, the portion of the message sequence containing the

most recent K bits is written as (m j-K+1,...,m j-1,m j), where m j isthe current bit.

• The (K-1)-bit state of the encoder at time j is thereforewritten simply as (m j-1,..., m j-K+2,m j-K+1). structure.




1/16/2

121

• In the case of the simple convolutional encoder of Figure 10.26a wehave (K-1)=2.

• The state of this encoder can assume anyone of four possible values,as described in Table 10.6.

• The trellis contains (L+K) levels, where L is the length of the incomingmessage sequence, and K is the constraint length of the code.

• The levels of the trellis are labeled as j=0,1,...,L+K-1 in Figure 10.27for K=3. periodic structure.


122

• Level j is also referred to as depth j; both terms are usedinterchangeably.

• The first (K-1) levels correspond to the encoder's departurefrom the initial state a, and the last (K-1) levels correspond tothe encoder' s return to the state a.

• Clearly, not all the states can be reached in these two portionsof the trellis.

• However, in the central portion of the trellis, for which thelevel j lies in the range K-1≤j≤L, all the states of the encoderare reachable.

• Note also that the central portion of the trellis exhibits a fixedperiodic structure.




1/16/2

123

• Let m denote a message vector, and c denote the corresponding code vectorapplied by the encoder to the input of a discrete memoryless channel.

• Let r denote the received vector, which may differ from the transmittedcode vector due to channel noise.

• Given the received vector r, the decoder is required to make an estimate m^ of the message vector.

• Since there is a one-to-one correspondence between the message vector m and the code vector c, the decoder may equivalently produce an estimate c^ of the code vector.

• We may then put m^=m if and only if c^=c.

• The object of the decoder is to minimize the probability of decoding error.

• For a binary symmetric channel the optimum decoding rule is: Choose theestimate c^ that has the minimum Hamming distance from the received vectorr.

• This is often referred to as a minimum distance decoder and it is intuitivelyappealing.

• This decoding strategy is also known to be optimum in a likelihood sense;thus, it is also referred to as a maximum likelihood decoder.

Decoding of Convolution Codes

124

• Recall the trellis description of a convolutional code that wasprovided in the previous section.

• A codeword represents one path through the trellis that outputsa symbol on each transition between nodes.

• The equivalence between maximum likelihood decoding andminimum distance decoding for a binary symmetric channelimplies that we may decode a convolutional code by choosing a

path in the code trellis whose coded sequence differs from thereceived sequence in the fewest number of places. That is, welimit our choice to the possible paths in the trellisrepresentation of the code.

Viterbi Algorithm



1/16/2

127

•

The Viterbi algorithm may also be applied to decoding ofconvolutional algorithms over other channels, such as theGaussian channel.

• For the Gaussian channel, distance is measured in terms of thegeometric distance between the transmitted symbol and thereceived estimate of that symbol, rather than the Hammingdistance.

• The Viterbi algorithm is commonly used in digital communicationsystems.

• In fact, many digital signal processors include special

instructions to assist Viterbi decoding.

Viterbi Algorithm

128

• The bit error rate performance of a convolutional code dependsnot only on the decoding algorithm used but also on the distanceproperties of the code.

• In this context, the most important single measure of aconvolutional code's ability to combat channel noise is the freedistance, denoted by dfree.

• The free distance of a convolutional code is defined as the

minimum Hamming distance between any two codewords in thecode.

• Similar to a block code, a convolutional code with free distancedfree can correct t errors if and only if dfree is greater than 2t.

• Investigations have shown that the free distance of systematicconvolutional codes are usually smaller than for the case ofnonsystematic convolutional codes as indicated in Table 10.7.

Free Distance and Asymptoticcoding Gain of a Convolution Code



1/16/2

129


130

• A bound on the bit error rate for convolutional codes may beobtained analytically, but the details of this evaluation are,however, beyond the scope of our present discussion.

• Here we simply summarize an asymptotic result for the binary-input additive white Gaussian noise (AWGN) channel, assumingthe use of binary phase-shift keying (PSK) with coherentdetection.

•For the case of a memoryless binary-input AWGN channel withno output quantization, theory shows that for large values ofEb/No the bit error rate for binary PSK with convolutionalcoding is dominated by the exponential factor exp(-dfreerEb/No), where the parameters are as previously defined.

• Accordingly, in this case, we find that the asymptotic codinggain, that is, the advantage over uncoded transmission at highSNR, is defined by




1/16/2

131

•

As mentioned above, this result assumes an unquantizeddemodulator output.

• If hard decisions are made on the channel outputs beforedecoding then both theory and practice show an approximate2dB loss in performance.

• The improvement without quantization, however, is attained atthe cost of increased decoder complexity due to therequirement for accepting analog inputs.

• The asymptotic coding gain for a binary-input AWGN channel isapproximated to within about 0.25 dB by a binary input Q-ary

output discrete memoryless channel with the number ofrepresentation levels Q=8.

• This means that we may avoid the need for an analog decoderby using a soft-decision decoder that performs finite outputquantization (typically, Q=8), and yet realize a performanceclose to the optimum.


chapter 10 information and forward error correction students

Documents