information engineering 2 coding theory nt 4

1

Prof. Dr.-Ing. Andreas Czylwik Information Engineering 2 SS 2004

p. 1FachgebietNachrichtentechnische Systeme

N T SUNIVERSITÄT

D U I S B U R GE S S E N

Information Engineering 2Coding Theory

NT 4Prof. Dr.-Ing. Andreas Czylwik



N T SUNIVERSITÄT


Information Engineering 2Organization

Lecture 2 hours per weekTransparencies will be available at:http://nts.uni-duisburg.deExercise 1 hour per week, Dipl.-Ing. Batu ChaliseWritten examination

New research areas in the Department of CommunicationSystems (Nachrichtentechnische Systeme)Subjects for master theses

2



N T SUNIVERSITÄT


Information Engineering 2Textbooks

Textbooks for the lecture:M. Bossert: Channel coding for telecommunications, John WileyBlahut: Theory and practice of error control codes, Addison-WesleyJ. H. van Lint: Introduction to coding theoryB. Friedrichs: Kanalcodierung, Springer-VerlagH. Schneider-Obermann: Kanalcodierung, Vieweg-Verlag

Textbooks in the field of digital communications incl. coding theory:S. Benedetto, E. Biglieri, V. Castellani: Digital transmission theory, Prentice-HallJ.G. Proakis: Digital communications, McGraw-HillS. Haykin: Communication systems, John Wiley



N T SUNIVERSITÄT


Information Engineering 2Contents

IntroductionInformation theoryChannel coding in digital communication systemsAlgebraic foundations for codingBlock codesConvolutional codesCoding techniquesOutlook

3



N T SUNIVERSITÄT


Block diagram of a digital communication system

Information Engineering 2Introduction

Source Sourceencoder

Channelencoder Modulator

Transmissionchannel Noise

DemodulatorChanneldecoder

SourcedecoderSink

Digital source

Digital sinkDiscrete channel



N T SUNIVERSITÄT



Source codingCompression of the communication signal to a minimum numberof symbols without the loss of information (reduction ofredundancy)Further compression when the loss of information is tolerated (e.g.video or audio transmission)

Coding for encryption Security against unwanted listeningInformation recovery only with a secret key

4



N T SUNIVERSITÄT



Error control coding: defined addition of redundancy toimprove transmission quality

FEC (forward error correction)Simplified model:

Channel quality determines residual error rate after decodingData rate does not depend on channel quality

Datasource

Channelencoder Channel Channel

decoderDatasink



N T SUNIVERSITÄT



Channel coding for error detection (CRC – cyclic redundancycheck, used for automatic repeat request methods (ARQ))

Reverse channel necessaryAdaptive insertion of redundancy (addition redundancy only incase of errors)Residual error probability does not depend on channel qualityNet data rate (throughput) depends on channel quality

ARQcontrol

Channelencoder Channel Channel decoder

(error detection)ARQ

control

Error informationBackward channel

5



N T SUNIVERSITÄT



Application: secure data transmission via wave guides or radio channels (especially mobile radio channels), secure data storageFather of information theory − Claude E. Shannon:Using channel coding the error probability can be reducedto an arbitrarily small value, if the data rate is smaller than the channel capacity. (Shannon does not present aconstruction method.)



N T SUNIVERSITÄT



Basic idea of channel coding:Insertion of redundancyGoal: error detection or error correctionBlock encoding process

Input vector: u = (u1, ... , uk)Output vector: x = (x1, ... , xn)Code rate:

nkR =C (1)

Length k Length n

Input vector u Output vector x

6



N T SUNIVERSITÄT



Code cube with n = 3, k = 3

Uncoded transmission: RC = 1Smallest distance between code words: dmin = 1Error detection or error correction not possible



N T SUNIVERSITÄT




Coded transmission: RC = 2/3Smallest distance between code words: dmin = 2Detection of a single error is possible – error correction is not possible

7



N T SUNIVERSITÄT




Coded transmission: RC = 1/3Smallest distance between code words: dmin = 3Detection of two errors and correction of a single error is possible



N T SUNIVERSITÄT


Information Engineering 2Information theory

Information theory: mathematical description ofinformation and of its transmissionCentral questions:

Quantitative calculation of the information content of messagesCalculation of the capacity of transmission channelsAnalysis and optimization of source and channel coding

8



N T SUNIVERSITÄT



Messages from the point of view of information theory

irrelevant relevant

errorinformation

redundant non redundant



N T SUNIVERSITÄT



Information content of a message: entropyQualitative ordering of messages: a message is more important if it is more difficult to predict it

Example:Tomorrow, the sun rises.Tomorrow, there will be bad weather.Tomorrow, there will be a heavy thunderstorm so that the electric power network will break down.

Messages from a digital source: sequence of symbols

Importance

Probability

9



N T SUNIVERSITÄT


Number of binary decisions that is needed to select amessage (symbol) from a source: H0

Total number of symbols: N

H0 = ld(N) bit/symbol

with ld(x) = logarithm to base 2 (logarithmus dualis) Unit: bit = binary digitExample: English text as a source

26 ⋅ 2 characters, 14 special characters including „space“: ‘ ’ “ ” ( ) - . , ; : ! ?In total: 66 alphanumeric charactersH0 = ld(66) = ln(66) / ln(2) = 6.044 bit/character


(2)



N T SUNIVERSITÄT



Example: one page of English text with 40 lines and 70 charactersper line

Number of different pages: N = 6640⋅70

Number of binary decisions to select a page: H0 = ld(6640⋅70) = 40 ⋅ 70 ⋅ ld (66) = 16.92 kbit/page

The number of binary decisions H0 does not take into account the probability of symbols!

10



N T SUNIVERSITÄT



Source alphabet: X = {x1, ... , xN}Probabilities of the symbols: p(x1), ... , p(xN)Desired properties of the information content I = f(p):

I(xi) ≥ 0 for 0 ≤ p(xi) ≤ 1I(xi) → 0 for p(xi) → 1I(xi) > I(xj) for p(xi) < p(xj)Two subsequent statistically independent symbols xi and xj

with p(xi,xj) = p(xi) ⋅ p(xj) :I(xi,xj) = I(xi) + I(xj)

General solution: I(xi) = −k ⋅ logb(p(xi)) (3)



N T SUNIVERSITÄT



Definition of the information content:

Entropy H(X) = average information content of a source:

bit/symbol)(

1ld)(

=

ii xp

xI

bit/symbol)(

1ld)(

)()()()(

1

1

∑

∑

=

=

⋅=

⋅==

N

i ii

N

iiii

xpxp

xIxpxIXH

(4)

(5)

11



N T SUNIVERSITÄT



Example: binary source with X = {x1, x2}Probabilities: p(x1) = p , p(x2) = 1 − pEntropy: H(X) = −p ld(p) − (1 − p) ld(1 − p)Shannon function: H(X) = S(p)

(6)

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

S(p) / bit

p



N T SUNIVERSITÄT



The entropy becomes maximum for equiprobable symbols: ⇒ H(X) ≤ H0 = ld N.

Proof:

Using the relation:

∑

∑∑

=

==

⋅⋅=

⋅−⋅=−

N

i ii

N

ii

N

i ii

Nxpxp

Nxpxp

xpNXH

1

11

)(1ld)(

ld)()(

1ld)(ld)(

1ln −≤ xx

(7)

(8) -1.0

0.0

1.0

1.0 2.0 x

ln(x)

x − 1

12



N T SUNIVERSITÄT



with

■

Redundancy of a source: RS = H0 − H(X)

0112ln

1)(12ln

1

1)(

12ln

1)(ld)(

1

1

=

−⋅=

−≤

−

⋅⋅≤−

∑

∑

=

=

NNxp

N

NxpxpNXH

N

ii

N

i ii

)1(2ln

1ld1ln −≤⇒−≤ xxxx

(9)



N T SUNIVERSITÄT



Transmission via a binary channel:design of binary coding schemes for a discrete sourceTasks of source coding:

Assignment of binary code words of length L(xi) to all symbols xi

Minimizing the average length of the code words

∑=

⋅==N

iiii xLxpxLL

1)()()(

L x1 1001x2 011

xN 010111L(xN)

....

....

(10)

13



N T SUNIVERSITÄT



Examples for binary coding of symbols:ASCII-Code: fixed code word length L(xi) = 8 (block code)Morse code (alphabet with dots, dashes and pauses for the separation of code words): more frequently occuring characters are assigned to shorter code words

Prefix condition for codes with variable length: no codeword equals the prefix of any other longer codeword

Example for a code without prefix condition:x1 → 0x2 → 01x3 → 10x4 → 100

Unique decoding of a bit sequence is not possible!Possible decoding results for the sequence010010: x1x3x2x1, x2x1x1x3, x1x3x1x3, x2x1x2x1,

x1x4x3



N T SUNIVERSITÄT



Example for a code with prefix condition:

The code is uniquely and instantaneously decodable!Decoding of the sequence 010010110111100: x1x2x1x2x3x4x2x1

Synchronization: begin of the sequence

x1 → 0x2 → 10x3 → 110x4 → 111

14



N T SUNIVERSITÄT



Decoding with a binary tree:

x1 → 0x2 → 10x3 → 110x4 → 111

Level 1: L(x1) = 1

0 1

0 1

0 1Level 2: L(x2) = 2

Level 3: L(x3) = L(x4) = 3



N T SUNIVERSITÄT


Kraft inequality: a binary code with the code word lengths L(x1), L(x2), ... , L(xN) fulfilling the prefix condition exists only, if:

Proof:Length of the tree structure = maximum code word lengthLmax = max(L(x1), L(x2), ... , L(xN))A code word in level L(xi) eliminates of all possible code words in level Lmax .Sum over all eliminated code words ≤ maximum number in the level Lmax

■


121

)( ≤∑=

−N

i

xL i

)(max2 ixLL −

maxmax 221

)( LN

i

xLL i ≤∑=

−

(11)

15



N T SUNIVERSITÄT


The equality holds, if all ends of the code tree are used by code words. General relation:Example for the special case that all probabilities are given by powers of 2:

Assignment of code words corresponding to the relation:L(xi) = Ki

Special case:


( ) iKixp 2

1)( =

)(XHL ≥

815)( == XHL

xi p(xi) code words x1 1/2 1 x2 1/4 00 x3 1/8 010 x4 1/16 0110 x5 1/16 0111

(12)

(13)



N T SUNIVERSITÄT



Shannons coding theorem: For any source a binary coding can be found with :

Proof:left-hand side: average length of code words ≥ average information content (entropy)right-hand side: selection of code words with

I(xi) ≤ L(xi) ≤ I(xi) + 1Multiplication with p(xi) and summing up over all i ⇒ (14)

1)()( +≤≤ XHLXH (14)

(15)

16



N T SUNIVERSITÄT


Proof, that a code with the prefix condition (11) exists:left-hand side of (15):

Inserting in (11): ■

Shannon CodingCode word length corresponding (15): I(xi) ≤ L(xi) ≤ I(xi) + 1Accumulated probabilities:


1)(211

)( ∑∑==

− =≤N

ii

N

i

xL xpi

)()(

1 2)()(ld)( ii

xLiixpi xpxLxI −≥⇒≤

=

∑−

==

1

1)(

i

jji xpP



N T SUNIVERSITÄT


Code words are decimals of Pi in binary notation:

Example:0.90 = 1⋅2−1 + 1⋅2−2 + 1⋅2−3 + 0⋅2−4 + 0⋅2−5 + 1⋅2−6 + 1⋅2−7 + 0⋅2−8 +0⋅2−9 + 1⋅2−10 + ...


i p(xi) I(xi) L(xi) Pi code 1 0.22 2.18 3 0.00 000 2 0.19 2.40 3 0.22 001 3 0.15 2.74 3 0.41 011 4 0.12 3.06 4 0.56 1000 5 0.08 3.64 4 0.68 1010 6 0.07 3.84 4 0.76 1100 7 0.07 3.84 4 0.83 1101 8 0.06 4.06 5 0.90 11100 9 0.04 4.64 5 0.96 11110

bit54.3=L

17



N T SUNIVERSITÄT


Tree representation of a Shannon code:

Disadvantage: not all ends of the tree are used for code words




N T SUNIVERSITÄT


Lengths of code words can be reduced ⇒ the code is not optimumRedundancy of a code:Redundancy of a source:

Huffman CodingRecursive procedure Starting point: symbols with smallest probabilitiesSame code word lengths for the two symbols with smallest probabilitiesHuffman coding minimizes the average code word length


)(C XHLR −=)(0Q XHHR −=

18



N T SUNIVERSITÄT


Algorithm:

Step 1: Ordering of symbols corresponding to their probability

Step 2: Assignment of 0 and 1 to the two symbols with lowest probability

Step 3: Combination of the two symbols with lowest probability xN−1 and xN to a new symbol with probability p(xN−1 ) + p( xN)

Step 4: Repetition of steps 1 - 3, until only one symbol is left

Example




N T SUNIVERSITÄT



x1 x2 x3 x4 x5 x6 x7 x8 x9 0.22 0.19 0.15 0.12 0.08 0.07 0.07 0.06 0.04

0 1

x1 x2 x3 x4 x8 x9 x5 x6 x7 0.22 0.19 0.15 0.12 0.10 0.08 0.07 0.07

0 1 0 1

x1 x2 x3 x6 x7 x4 x8 x9 x5 0.22 0.19 0.15 0.14 0.12 0.10 0.08

0 1 00 01 1

x1 x2 x8 x9 x5 x3 x6 x7 x4 0.22 0.19 0.18 0.15 0.14 0.12

00 01 1 00 01 1

19



N T SUNIVERSITÄT



x6 x7 x4 x1 x2 x8 x9 x5 x3 0.26 0.22 0.19 0.18 0.15

00 01 1 000 001 01 1

x8 x9 x5 x3 x6 x7 x4 x1 x2 0.33 0.26 0.22 0.19

000 001 01 1 00 01 1 0 1

x1 x2 x8 x9 x5 x3 x6 x7 x4 0.41 0.33 0.26

0 1 0000 0001 001 01 100 101 11

x8 x9 x5 x3 x6 x7 x4 x1 x2 0.59 0.41

00000 00001 0001 001 0100 0101 011 10 11



N T SUNIVERSITÄT


Tree structure of the Huffman code example


bit01.3=L

20



N T SUNIVERSITÄT


Discrete source without memoryJoint entropy of sequences of symbols

Two independent symbols: p(xi,yk) = p(xi) ⋅ p(yk)


[ ]

)()(),(

))((ld)()())((ld)()(

))((ld))((ld)()(

)),((ld),(),(

1´111

1 1

1 1

YHXHYXH

ypypxpxpxpyp

ypxpypxp

yxpyxpYXH

N

kkk

N

ii

N

iii

N

kk

N

i

N

kkiki

N

i

N

kkiki

+=

⋅⋅−⋅⋅−=

+⋅⋅−=

−=

∑∑∑∑

∑∑

∑∑

====

= =

= =

(17)

(16)



N T SUNIVERSITÄT


H(X1,...,XM) ≤ ≤ H(X1,...,XM) + 1

M⋅H(X) ≤ ≤ M⋅H(X) + 1

H(X) ≤ ≤ H(X) + 1/M

M independent symbols from the same source:

H(X1, X2, ... , XM) = M ⋅ H(X)

More efficient coding by coding of symbol sequencesShannon coding theorem:

Disadvantage of coding of symbol sequences: rapidly increasing computational effort (exponential increase of number of combined symbols)


),,( 1 MM XXL K

LM ⋅

L

(18)

(19)

21



N T SUNIVERSITÄT


Example: Coding of symbol sequencesBinary source with X = {x1, x2}Probabilities: p(x1) = 0.2, p(x2) = 0.8Entropy: H(X) = 0.7219 bits/symbolCoding of single symbols:

Average code word length:


symbol p(xi) code L(xi) p(xi)⋅L(xi) x1 0.2 0 1 0.2 x2 0.8 1 1 0.8

Σ = 1

bit/symbol 1=L



N T SUNIVERSITÄT


Coding of pairs of symbols:

Average code word length:


pair of symbols

p(xi) code L(xi) p(xi)⋅L(xi)

x1x1 0.04 101 3 0.12 x1x2 0.16 11 2 0.32 x2x1 0.16 100 3 0.48 x2x2 0.64 0 1 0.64

Σ = 1.56

bit/symbol 78.0=L

22



N T SUNIVERSITÄT


Coding of triplets of symbols:

Average Code word length:


triplet of symbols

p(xi) code L(xi) p(xi)⋅L(xi)

x1x1x1 0.008 11111 5 0.040 x1x1x2 0.032 11100 5 0.160 x1x2x1 0.032 11101 5 0.160 x1x2x2 0.128 100 3 0.384 x2x1x1 0.032 11110 5 0.160 x2x1x2 0.128 101 3 0.384 x2x2x1 0.128 110 3 0.384 x2x2x2 0.512 0 1 0.512

Σ = 2.184

bit/symbol 728.0=L



N T SUNIVERSITÄT


Discrete source with memoryReal sources: correlation between individual symbolsTwo correlated symbols: p(xi,yk) = p(xi) ⋅ p(yk|xi) = p(yk) ⋅ p(xi|yk)


[ ]

)|()(),(

))|((ld),())((ld)()|(

))|((ld))((ld)|()(

)),((ld),(),(

1 11 1

1 1

1 1

XYHXHYXH

xypyxpxpxpxyp

xypxpxypxp

yxpyxpYXH

N

i

N

kikki

N

i

N

kiiik

N

i

N

kikiiki

N

i

N

kkiki

+=

⋅−⋅⋅−=

+⋅⋅−=

−=

∑∑∑∑

∑∑

∑∑

= == =

= =

= =

(20)

23



N T SUNIVERSITÄT


H(Y | X) = conditional entropy

Entropy of a symbol ≥ conditional entropy

H(Y) ≥ H(Y | X )

Proof:


(21)∑∑= =

⋅−=N

i

N

kikki xypyxpXYH

1 1))|((ld),()|(

(22)

∑∑

∑∑∑

= =

= ==

⋅=−

⋅−=⋅−=

N

i

N

k ik

kki

N

i

N

kkki

N

kkk

xypypyxpYHXYH

ypyxpypypYH

1 1

1 11

)|()(ld),()()|(

))((ld),())((ld)()(



N T SUNIVERSITÄT


Using the inequality:

with p(xi,yk) = p(xi) ⋅ p(yk|xi)

■


∑∑= =

−⋅⋅≤−

N

i

N

k ik

kki xyp

ypyxpYHXYH1 1

1)|(

)(2ln

1),()()|(

)1(2ln

1ld1ln −≤⇒−≤ xxxx

0),()()(2ln

1)()|(1 11 1

=

−≤− ∑∑∑∑

= == =

N

i

N

kki

N

i

N

kki yxpypxpYHXYH

= 1 = 1

24



N T SUNIVERSITÄT


Entropy of a source with memory < entropy of a source withoutmemoryVery efficient source coding for sequences of symbols withmemory

General description of a discrete source with memory as a Markov source

Markov processes:Sequence of random variables z0, z1, z2, ... ,zn, (n = time axis)zi and zj are statistically independent:


)(),...,,|( 021,...,,| 021 nznnnzzzz zfzzzzfnnnn

=−−−− (23)



N T SUNIVERSITÄT


zi and zj are stat. dependent (Markov process of mth order):

Often: first order Markov process (m = 1):

zi are from a limited number of discrete values: zi ∈ {x1, ... , xN}

⇒ Markov chain Complete description of a Markov chain by transitionprobabilities:


)|(),...,,|( 1|021,...,,| 1021 −−− −−−= nnzznnnzzzz zzfzzzzf

nnnnn

),...,|(),...,|(101 101 mnnn imninjniinjn xzxzxzpxzxzxzp

−−−======= −−−

(25)

(26)

),...,|(),...,,|( 1,...,|021,...,,| 1021 mnnnzzznnnzzzz zzzfzzzzfmnnnnnn −−−− −−−−

=

(24)

25



N T SUNIVERSITÄT


Homogeneous Markov chain: transition probabilities do notdepend on time:

stationary Markov chain: steady-state does not depend on initialprobabilities

Homogeneous and stationary Markov chain of first order (m = 1):


jjjnkn

ikjnkn

wxpxzpxzxzp ====== +∞→

+∞→

)()(lim)|(lim

ijinjn pxzxzp === − )|( 1

(28)

(29)

),...,|(),...,|(11 11 mm imkikjkimninjn xzxzxzpxzxzxzp ======= −−−−

(27)



N T SUNIVERSITÄT


Transition matrix:

Properties of the transition matrix:

Probability vector:

Calculation of w utilizing the steady state condition:


11

=∑=

N

jijp

))()()(()( 2121 NN xpxpxpwww LL ==w

Pww =

(31)

(32)

(33)

=

NNNN

N

N

ppp

pppppp

L

LOLL

L

L

21

22221

11211

P (30)

26



N T SUNIVERSITÄT


Stationary Markov source: first order Markov chainEntropy of a stationary Markov source = steady state entropy:


∑ ∑

∑∑

∑

= =−−

= =−−

=−−

−−−∞→

∞

==⋅==⋅−=

==⋅==−=

=⋅===

==

N

i

N

jinjninjni

N

i

N

jinjninjn

N

iinniiinn

nnnnnn

xzxzpxzxzpw

xzxzpxzxzp

xzzHwxzzH

zzHzzzzHZH

1 111

1 111

111

1021

))|((ld)|(

))|((ld),(

)|()|(

)|(),,,|(lim)( K

(37)

(34)

(35)

(36)



N T SUNIVERSITÄT


⇒ Entropy H∞(Z) = conditional entropy, since symbols fromthe past are already known

Coding of a Markov source:Taking into account the memorye. g. Huffman coding which takes into account theinstantaneous state of the source

Fundamental problem for variable length source codes: catastrophic error propagation


)()()|()( 01 nnnn zHzHzzHZH ≤≤= −∞ (38)

27



N T SUNIVERSITÄT


Example of a Markov source:States = transmitted symbolszn ∈ {x1, x2, x3}

Transition probabilities:


( ) ( )

===== −

3.01.06.02.05.03.04.04.02.0

)|( 1 Pijinjn pxzxzp

zn zn−1

x1 x2 x3

x1 0.2 0.4 0.4 x2 0.3 0.5 0.2 x3 0.6 0.1 0.3

x1 x2

x3

0.20.3

0.4

0.2

0.1

0.4

0.5

0.3

0.6



N T SUNIVERSITÄT


Calculation of steady-state probabilities with:

w1 = 0.2 w1 + 0.3 w2 + 0.6 w3

w2 = 0.4 w1 + 0.5 w2 + 0.1 w3

w3 = 0.4 w1 + 0.2 w2 + 0.3 w3

1 = w1 + w2 + w3

Linear dependence !Solution:


1 and 1

== ∑=

N

iiwPww

3011.03441.03548.0 9328

39332

29333

1 ≈=≈=≈= www

28



N T SUNIVERSITÄT


Calculation of the entropy with (35):

Inserting numbers:


∑ ∑

∑

= =

=−∞

⋅⋅−=

=⋅=

N

i

N

jijiji

N

iinni

ppw

xzzHwZH

1 1

11

)(ld

)|()(

(39)

symbolbit / 2955.1ld3.0ld1.0ld6.0)|(



3.01

0.11

0.61

31

2.01

0.51

0.31

21

4.01

0.41

0.21

11

≅⋅+⋅+⋅==

≅⋅+⋅+⋅==

≅⋅+⋅+⋅==

−

−

−

xzzH

xzzH

xzzH

nn

nn

nn



N T SUNIVERSITÄT


Entropy:

For comparison:


symbol/bit441.1)|()|()|()( 313212111

≅=+=+== −−−∞ xzzHwxzzHwxzzHwZH nnnnnn

symbol/bit5814.11ld)(lim)(1

≅⋅== ∑=∞→

N

i iin

n wwzHZH

symbol/bit5850.13ld0 ≅=H

(40)

29



N T SUNIVERSITÄT


State-dependent Huffman coding for the example: different codings for each state zn−1

pij code words

average code word length:


zn

zn−1

x1 x2 x3 ⟨L⟩|zn−1

x1 11 10 0 1.6 x2 10 0 11 1.5 x3 0 11 10 1.4

symbol/bit5054.1

)|()|(1 1

11

≅

==⋅⋅==== ∑∑= =

−−N

i

N

jinjnijiinjn xzxzLpwxzxzLL

(41)

zn zn−1

x1 x2 x3

x1 0.2 0.4 0.4 x2 0.3 0.5 0.2 x3 0.6 0.1 0.3



N T SUNIVERSITÄT


Source coding without knowledge of statistical parametersRun-length coding

Replacing sequences of repeated symbols with the count and only one single symbolExample:

Source sequence:aaaabbbccccccccdddddeeeeeaaaaaaabddddd.....Encoded sequence:4a3b8c5d5e7a1b5d.....

Encoding with dictionariesIdea: repetitions within the data sequence are replaced by (shorter) references to the dictionary


30



N T SUNIVERSITÄT


Static dictionary: little compression for most data sourcespoor matching of the dictionary to particular data

Semi-adaptive dictionarydictionary tailored for the message to be encodeddictionary has to transmitted via the channeltwo passes over the data:

» to build up the dictionary» to encode the data

Adaptive dictionarysingle pass for building up the dictionary and encodingLempel Ziv algorithm: *.zip *.gzip files




N T SUNIVERSITÄT


Lempel Ziv algorithm (LZ77)

Search for the longest match between the symbols of the look-ahead buffer and a sequence within the search bufferFixed-length codewords contain:

position of match (counting from 0) length of matchnext symbol in the look-ahead buffer


search buffer look-ahead buffer

..... a b c b a a b c b d f e e a a b a a c c d d d c d .....

sliding window

31



N T SUNIVERSITÄT


Parameters:symbol alphabet: {x0,x1,x2, .... , xα−1}input sequence: S = {S1,S2,S3,S4, ....}length of look-ahead buffer: LS

window size: n

Codewords: Ci = {pi,li,Si}position of match: pi

length of match: li

next symbol: Si

length of codewords:same alphabet for data as well as codewords


1)(log)(log SSC ++−= LLnL αα



N T SUNIVERSITÄT


Example:symbol alphabet: {0,1,2}input sequence: S = {0010102102102120210212001120.....}length of look-ahead buffer: LS = 9window size: n = 18Step 1:

Step 2:


1 0 2 10 0 0 0 0 0 0 0 0 0 2 1 0 2 . . . .1000

C1 = {22 02 1}

1 0 2 10 0 0 0 0 0 0 1 2 0 2 1 2 0 . . . .0100

C2 = {21 10 2}

32



N T SUNIVERSITÄT


Step 3:

Step 4:

Number of encoded source symbols after 4 steps: 3 + 4 + 8 + 9 = 24Number of code word symbols after 4 steps: 4 × 5 = 20


0 2 1 20 0 0 1 0 1 0 2 1 0 2 1 0 2 . . . .2010

C3 = {20 21 2}

2 1 2 01 0 2 1 0 2 1 2 0 0 1 1 2 0 . . . .1202

C4 = {02 22 0}



N T SUNIVERSITÄT


Decoding:Step 1: C1 = {22 02 1}

Step 2: C2 = {21 10 2}

Step 3: C3 = {20 21 2}

Step 4: C4 = {02 22 0}


0 0 0 0 0 0 0 0 1000

0 0 0 0 0 0 0 1 20100

0 2 1 20 0 0 1 0 1 0 2 12010

2 1 2 01 0 2 1 0 2 1 2 0 01202

33



N T SUNIVERSITÄT



Transmission of information via a discrete memorylesschannel

Noiseless channel: information at the output = information at the input ⇒ transmitted information = entropy of the sourceNoisy channel: information at the output < information at the input ⇒ transmitted information < entropy of the sourceDefinition:average mutual information = actually transmitted information



N T SUNIVERSITÄT



Discrete memoryless channel (DMC)

Input signal:

Output signal:

},...,,{ 21 XNxxxX ∈

},...,,{ 21 YNyyyY ∈

discretememoryless

channel

X Y

p11

p21

p12

p22

x1

y2

xNX

x2

y1

yNYpNXNY

34



N T SUNIVERSITÄT



Transition matrix:

with

Input probabilities:

Output probabilities:

Relation between input and output probabilities:

( ) ( )

=====

YXXX

Y

Y

NNNN

N

N

ijij

ppp

pppppp

pxXyYp

L

LOLL

L

L

21

22221

11211

)|(P

11

=∑=

N

jijp

(42a)

(42b)

Ppp ⋅= XY

( ))(),...,(),( 21 XNX xpxpxp=p

( ))(),...,(),( 21 YNY ypypyp=p

(42c)



N T SUNIVERSITÄT



Chain of two channels:

Output probabilities:

Resulting transition matrix:

(45))()( 2121 PPpPPpp ⋅⋅=⋅⋅= XXZ

(46)

P1

X YP2

Z

21 PPP ⋅=

35



N T SUNIVERSITÄT



Binary channel:

Probabilities at input and output:

Error probability of a binary channel:

212121

212121)()(

)|()()|()()error(pxppxp

xypxpxypxpp⋅+⋅=

⋅+⋅=

=2221

1211pppp

P

(47)

(48)

⋅=

22211211

2121 ))(),(())(),(( ppppxpxpypyp



N T SUNIVERSITÄT



Binary symmetric channel (BSC)

Error probability:

errerr21

212121)]()([

)()()error(ppxpxp

pxppxpp=⋅+=

⋅+⋅=

−

−=

errerr

errerr1

1pp

ppP (49)

(50)

y2

y11−perr

perr

perr

x1

x21−perr

36



N T SUNIVERSITÄT



Example for mutual information – qualitative consideration:Transmission of 1000 binary statistically independent andequiprobable symbols (p(0) = p(1) = 0.5)Binary symmetric channel with perr = 0.01Average number of correctly received symbols: 990But: T(X,Y) < 0.99 bit/symbolReason: exact position of errors is not known



N T SUNIVERSITÄT



Definitions of entropy at a discrete memoryless channel:Input entropy = average information content of the inputsymbols:

Output entropy = average information content of the outputsymbols:

Joint entropy = average information content (uncertainty) of the whole transmission system:

)(1ld)()(

1 i

N

ii xp

xpXHX∑=

⋅=

)(1ld)()(

1 j

N

jj yp

ypYHY∑=

⋅=

∑ ∑= =

⋅=X YN

i

N

j jiji yxp

yxpYXH1 1 ),(

1ld),(),(

(51a)

(51b)

(51c)

37



N T SUNIVERSITÄT



Conditional entropy H(Y|X) = average information content at the output for known input symbols = entropy of theirrelevance

Conditional entropy H(X|Y) = average information content at the input for known output symbols = entropy of theinformation that is lost in the channel = entropy of the equivocation

∑ ∑= =

⋅=X YN

i

N

j ijji xyp

yxpXYH1 1 )|(

1ld),()|(

∑ ∑= =

⋅=X YN

i

N

j jiji yxp

yxpYXH1 1 )|(

1ld),()|(

(51d)

(51e)



N T SUNIVERSITÄT



Relations between different types of entropy:

Average information flow (average mutual information):

)|()()|()(),(),( YXHYHXYHXHXYHYXH +=+==

)()|( XHYXH ≤

)()|( YHXYH ≤

),()()(

)|()(

)|()(),(

YXHYHXH

XYHYH

YXHXHYXT

−+=

−=

−=

(53)

(54)

(55)

(56a)

(56b)

(56c)

38



N T SUNIVERSITÄT



Information flow

source sourceencoder

sourcedecoder sink

H(U)H(X)

H(Y)

H(X|Y)

H(Y|X)

equivocation

irrelevance

information flow

)ˆ(UH

transmission channel

T(X,Y)



N T SUNIVERSITÄT



Examples – information flow:Ideal noiseless channel:

Entropies: H(X|Y) = 0, H(Y|X) = 0, H(X,Y) = H(X) = H(Y)T(X,Y) = H(X) = H(Y)

Useless, very noisy channel: p(xi,yj) = p(xi) ⋅ p(yj) = p(yj|xi) ⋅ p(xi) ⇒ p(yj|xi) = p(yj) ⇒ pij = pkj

Entropies: H(X|Y) = H(X), H(Y|X) = H(Y), H(X,Y) = H(X) + H(Y)T(X,Y) = 0

≠=

=jiji

pij for 0for 1

(57)

(58)

(59)

(60)

39



N T SUNIVERSITÄT



Example for mutual information – quantitative consideration:Transmission of 1000 binary statistically independent and equiprobable symbols (p(0) = p(1) = 0.5)Binary symmetric channel with perr = 0.01

bit/symbol9192.0

)(11ld1

1ld)1())1()0((1

)|(1ld)|()(

)(1ld)(

)|()(),(

errerr

errerr

err

1 11

≅

−=

+

−−+−=

⋅⋅−⋅=

−=

∑ ∑∑= ==

pSp

pp

ppp

xypxypxp

ypyp

XYHYHYXTX YY N

i

N

j ijiji

N

j jj

(61)



N T SUNIVERSITÄT



Channel capacityThe average mutual information (average information flow) depends on the probability of the source symbols.Definition of channel capacity:

Channel capacity = maximum average mutual information flow∆T = symbol periodUnit for channel capacity: bit/sC depends on properties of the channel – it does not depend on thesource!

),(max1

)(YXT

TC

ixp∆= (62)

40



N T SUNIVERSITÄT



Definition of the average information flow ...Average information flow = entropy / time:H′(X) = H(X) / ∆T

Average mutual information flow = average mutual information /

time:T′(X,Y) = T(X,Y) / ∆T

Decisions for selecting symbols / time: H0′(X) = H0(X) /∆T

(63)

(64)

(65)



N T SUNIVERSITÄT



Example: binary symmetric channel – BSC:

−

−+−

+−−+−−+

−+−+=

−=

errerr

errerr

err1err1err1err1

err1err1err1err1

11ld)1(1ld

211ld)21(

21ld)2(

)|()(),(

pp

pp

pppppppp

pppppppp

XYHYHYXT

(66)

41



N T SUNIVERSITÄT



Average mutual information

Maximum for p1 = p(x1) = 0.5

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p(x1)

T(X,Y)bi

t/sym

bol

perr = 0

perr = 0.1

perr = 0.2

perr = 0.3

perr = 0.5



N T SUNIVERSITÄT



Channel capacity

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 perr

C⋅∆T / bit

)(11

1ld)1(1ld1),(max errerr

errerr

err)(

pSp

pp

pYXTTCixp

−=

−

−+−==∆⋅(67)

42



N T SUNIVERSITÄT



Example: binary erasure channel – BEC

−

−=

err

err

err

err1

00

1pp

pp

P err1 pTC −=∆⋅

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 perr

C⋅∆T / bit

(69)(68)



N T SUNIVERSITÄT



Theorem of channel capacity (Shannon 1948)For any ε > 0 and any information flow of a source R smaller thanchannel capacity C (R < C ), a binary block code of length n (nsufficiently large) can be found, so that the residual error probabilityafter decoding in the receiver is smaller than ε .Reverse statement: even with largest coding effort the residual errorprobability cannot decrease below a certain limit for R > C. Proof of the theorem uses random block codes (random coding argument):

Proof for an average over all codesAll known codes are bad codes

No rule for construction of codes

43



N T SUNIVERSITÄT



Channel capacity: optimum for infinite code word length ⇒ infinite time delay, infinite complexityMore practical theorem related to the theorem of channel capacity by definition of Gallager‘s error exponent for a DMC with NX input symbols:

There exists always an (n,k) block code with RC = k / n ld NX < C ∆T, so that the word error probability is given by:

⋅−⋅−= ∑ ∑

=

+

=

+≤≤

Y X

i

N

j

sN

i

sijixps

xypxpRsRE1

1

1

11

C)(10

CG )|()(ldmaxmax)(

)(w CG2 REnP ⋅−<

(70)

(71)



N T SUNIVERSITÄT



Gallager‘s error exponentProperties:EG(RC) > 0 for RC < C·∆TEG(RC) = 0 for RC ≥ C·∆T

Definition of R0(computational cut-off rate) :

R0 = EG(RC = 0) 0.0

0.2

0.4

0.6

0.8

0.0 0.2 0.4 0.6 0.8 1.0

EG(RC)

RC

R0

R0 C ·∆T

(72)

44



N T SUNIVERSITÄT



Computational cut-off rate R0:Maximum of EG(RC = 0) lies at s = 1

Comparison for s = 1:

⋅−=== ∑ ∑

= =

Y X

i

N

j

N

iiji

xpxypxpRER

1

2

1)(CG0 )|()(ldmax)0(

C01

2

1C

)(CG )|()(ldmax)( RRxypxpRRE

Y X

i

N

j

N

iiji

xp−=

⋅−−≥ ∑ ∑

= =

(73)

(74)



N T SUNIVERSITÄT



R0-theoremThere exists always an (n,k) block code with RC = k / n ld NX < C ∆T, so that the word error probability (for maximum likelihood decoding)is given by:

No rules for construction of good codesRange of values for the code rate:

0 ≤ RC ≤ R0 PW is bounded by (75)R0 ≤ RC ≤ C ∆T an upper bound for PW is difficult to calculateRC > C ∆T PW cannot become arbitrarily small

)(w C02 RRnP −−< (75)

45



N T SUNIVERSITÄT



Comparison between channel capacity and computational cut-off rate R0 :

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2 0.3 0.4 0.5 perr

R0

C ∆T

bit/s

ymbo

l



N T SUNIVERSITÄT


Information Engineering 2Channel coding in digital communication systems

Goals: principles and examples for block codes Design of code words

Binary code words Redundant codesCode C = set of all code words

Code word c = (c0, c1, ... , cn-1) with c ∈ CEncoding is a memoryless assignment:

information word code word u = (u0, u1, ... , uk-1) → c = (c0, c1, ... , cn-1)

k information bits n code bits n ≥ k

46



N T SUNIVERSITÄT



Identical codes: codes with the same code words

Equivalent codes: Codes, which become identical after a permutationof bits

General notation: (n,k,dmin)q block code

q = number of symbols (size of alphabet)

Code rate:

Number of code words: N = qk = 2k

1C ≤=nkR (76)

(77)



N T SUNIVERSITÄT


Systematic codes:Code word c = (u, p)

m = n-kparity-check bits

Non-systematic codes: information and parity-check bits cannot be separatedLinear block codes can always be converted into equivalent systematic codes

u0 u1 u2 u3 ... ... uk-1

↓ ↓ ↓ ↓ ↓ ↓ ↓c0 c1 c2 c3 ... ... ck-1 ck ck+1 ... ... cn-1

(78)


47



N T SUNIVERSITÄT


Addition and multiplication in a binary number system (modulo 2):

Two binary vectors x and y with the same lengthHamming distance:

dH(x,y) = number of different bits for x and y

Example: dH( 0 1 1 1 0 1 0 1,1 0 1 0 0 1 0 1 ) = 3

⊕ 0 10 0 11 1 0

⊗ 0 10 0 01 0 1




N T SUNIVERSITÄT


(Hamming) weight of a vector x:

Example: wH(0 1 1 1 0 1 0 1) = 5

Hamming distance: dH(x,y) = wH(x + y)

Example: x = ( 0 1 1 1 0 1 0 1 )y = ( 1 0 1 0 0 1 0 1 )

x + y = ( 1 1 0 1 0 0 0 0 )wH(x + y) = 3


(80)

0 fromdifferent bits ofnumber )(1

0H == ∑

−

=

n

iixw x (79)

48



N T SUNIVERSITÄT


Transmission via a binary channel:

y = x + e

e = error vector

Repetition codek = 1 information bit → n − 1 repetitions2k = 2 code words: c1 = (0 0 0 ... 0) and c2 = (1 1 1 ... 1) Repetition code is a systematic codeSimple decoding by majority vote (especially, if n is odd)(n − 1)/2 errors can be corrected, n − 1 errors can be detected


(81)x y

e



N T SUNIVERSITÄT



Example for a repetition code:n = 5 ⇒ RC = 1/5 u1 = (0) → c1 = (0 0 0 0 0) and u2 = (1) → c2 = (1 1 1 1 1)

Transmission via a noisy channel:e1 = (0 1 0 0 1) and x1 = (1 1 1 1 1) y1 = x1 + e1 = (1 0 1 1 0) →

e2 = (1 1 0 1 0) and x2 = (0 0 0 0 0) y2 = x2 + e2 = (1 1 0 1 0) →

Two errors can be corrected, four detected

)1(ˆ1 =u

)1(ˆ 2 =u

49



N T SUNIVERSITÄT



Parity check code

k information bits, a single parity check bit (m = 1) → n = k + 1 2k code words

c = (u , p) with p = u0 + u1 + u2 + ... + uk−1

(number of ones in code words c is even)

The parity check code is a systematic code. No error can be corrected, an odd number of errors can be detected.Parity check:s0 = y0 + y1 + y2 + ... + yn−1 = 0 → no errors0 = y0 + y1 + y2 + ... + yn−1 = 1 → error

(82)



N T SUNIVERSITÄT



Example: k = 3

y1 = ( 0 1 1 0) → no error y2 = ( 1 1 1 0) → error

code word information bits

Parity check bit

c0 000 0 c1 001 1 c2 010 1 c3 011 0 c4 100 1 c5 101 0 c6 110 0 c7 111 1

50



N T SUNIVERSITÄT



Hamming codeA single error in a codeword can be corrected.Example: (7,4) Hamming code

Parity check bits:c4 = c0 + c1 + c2

c5 = c0 + c1 + c3

c6 = c0 + c2 + c3

u0 u1 u2 u3

↓ ↓ ↓ ↓c0 c1 c2 c3 c4 c5 c6

(83a)(83b)(83c)



N T SUNIVERSITÄT



Matrix representation of block codesx = u G

G = generator matrix (k × n matrix)

Systematic block codes:

G = [Ik P]

Ik = identity matrix k × k, P = parity bit matrix k × (n − k)

Example: (7,4) Hamming code

(84)

(85)

=

110|101|011|111|

1000010000100001

G (86)

information engineering 2 coding theory nt 4

Documents