ensc 424 - multimedia communications engineering huffman...

18
9/13/2005 J. Liang: SFU ENSC 424 1 ENSC 424 - Multimedia Communications Engineering Huffman Coding (1) Jie Liang Engineering Science Simon Fraser University [email protected]

Upload: ngotruc

Post on 21-Mar-2018

232 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 1

ENSC 424 - Multimedia Communications EngineeringHuffman Coding (1)

Jie LiangEngineering Science

Simon Fraser [email protected]

Page 2: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 2

Outline

�Entropy Coding�Prefix code�Kraft-McMillan inequality�Huffman Encoding�Minimum Variance Huffman Coding�Extended Huffman Coding

Page 3: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 3

Entropy Coding

�Design the mapping from source symbols to codewords

�Lossless mapping�Goal: minimizing the average codeword length

�Approach the entropy of the source

Page 4: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 4

Example: Morse Code� Represent English characters and numbers by

different combinations of dot and dash (codewords)� Examples:

E I A

T O S Z

�Letters have to be separated by space,Or paused when transmitting over radio.SOS:

pause

�Problem:Not uniquely decodable!

Page 5: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 5

Entropy Coding: Prefix-free Code� No codeword is a prefix of another one.� Can be uniquely decoded.� Also called prefix code� Example: 0, 10, 110, 111� Binary Code Tree

0 1

0 1

0 1

010

110 111

Root node

leaf node

Internal node

� Prefix-free code contains leaves only.� How to express the requirement mathematically?

Page 6: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 6

Kraft-McMillan Inequality� Let C be a code with N codewords with length l i,

i=1,…N. If C is uniquely decodable, then

121

≤∑=

−N

i

l i

� If a set of l i satisfies the inequality above, then there exists a prefix-free code with codeword lengths

l i, i=1,…N.

Page 7: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 7

Kraft-McMillan Inequality

22 1211

LN

i

lLN

i

l ii ≤⇔≤ ∑∑=

=

� To see this, expand the binary code tree to depth L = max(li)

010

110 111

� Number of nodes in the last level:

� Each code has a sub-tree:� The number of offsprings in the last level:

� K-M inequality:

# of L-th level offsprings of all codes is less than 2^L.

ilL−2

L2

L = 3 010

110 111

11

01011110111

Leads to more than2^L offspring

Page 8: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 8

Outline

�Entropy Coding�Prefix code�Kraft-McMillan inequality�Huffman Encoding�Minimum Variance Huffman Coding�Extended Huffman Coding

Page 9: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 9

Huffman Coding

a

b a b

truncate

� A procedure to construct optimal prefix-free code � Result of David Huffman’s term paper in 1952 when

he was a PhD student at MIT� Shannon � Fano � Huffman (1925-1999)

� Observations:� Assign short codes to frequent symbols.� In an optimum prefix-free code, the two codewords that

occur least frequently will have the same length.

Page 10: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 10

Huffman Code Design� Another property of Huffman coding:

� The codewords of the two lowest probability symbols differ only in the last bit.

� Requirement:� The source probability distribution

(Not available in most cases)� Procedure:

1. Sort the probability of all source symbols in a descending order.2. Merge the last two into a new symbol, add their probabilities.3. Repeat Step 1, 2 until only one symbol (the root) is left.4. Code assignment:

Traverse the tree from the root to each leaf node, assign 0 to the top branch and 1 to the bottom branch.

Page 11: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 11

Example 3.2.1� Source alphabet A = {a1, a2, a3, a4, a5}� Probability distribution: {0.2, 0.4, 0.2, 0.1, 0.1}

a2 (0.4)

a1(0.2)

a3(0.2)

a4(0.1)

a5(0.1)

Sort

0.2

merge Sort

0.4

0.2

0.2

0.2

0.4

merge Sort

0.4

0.2

0.40.6

merge

0.6

0.4

Sort

1

merge

Assign code

0

1

1

00

01

1

000

001

01

1

000

01

0010

0011

1

000

01

0010

0011

Page 12: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 12

Huffman code is prefix-free

01

000

0010 0011

1

000

01

0010

0011

1

�All codewords are leaf nodes� No code is a prefix of any other code.

(Prefix free)

Page 13: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 13

Average Codeword Length vs Entropy

� Source alphabet A = {a, b, c, d, e}� Probability distribution: {0.2, 0.4, 0.2, 0.1, 0.1}� Code: {01, 1, 000, 0010, 0011}

� Entropy:H(S) = - (0.2*log2(0.2)*2 + 0.4*log2(0.4)+0.1*log2(0.1)*2)= 2.122 bits / symbol

� Average Huffman codeword length:L = 0.2*2+0.4*1+0.2*3+0.1*4+0.1*4 = 2.2 bits / symbol

� In general: H(S) ≤ L < H(S) + 1

Page 14: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 14

Huffman Code is not unique

0.4

0.2

0.40.6

0.6

0.4

0

1

1

00

01

� Multiple ordering choices for tied probabilities

� Two choices for each split: 0, 1 or 1, 0

0.4

0.2

0.4 0.6

0.6

0.4

1

0

0

10

11

a

b

c

0.4

0.2

0.40.6

0.6

0.4

1

0

0

10

11

b

a

c

0.4

0.2

0.40.6

0.6

0.4

1

0

0

10

11

Page 15: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 15

Minimum Variance Huffman Code�Put the combined symbol as high as possible

in the sorted list�Prevent unbalanced tree:

�Reduce memory requirement for decoding�(revisited later)

�Repeat previous example�Compute average codeword length

Page 16: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 16

Extended Huffman Code

�Code multiple symbols jointly�Composite symbol: (X1, X2, …, Xk)

�Code symbols of different meanings jointly�JPEG: Run-level coding�H.264 CAVLC: context-adaptive variable length

coding� # of non-zero coefficients and # of trailing ones

�Revisited later

�Alphabet increased exponentioally: Nk

Page 17: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 17

Example

� P(Xj = 0) = P(Xj = 1) = 1/2� Entropy H(Xj) = 1 bit / symbol

� Joint probability: P(X2i, X2i+1)� P(0, 0) = 3/8, P(0, 1) = 1/8� P(1, 0) = 1/8, P(1, 1) = 3/8

symbol / bits 0.9056or symbols, 2 / bits 1.8113),( 122 =+ii XXH

� Huffman code for Xj: 0, 1� Average code length 1 bit / symbol� Huffman code for (X2i, X2i+1):

00: 1, 11: 00, 01: 010, 10: 011� Average code length: 0.9375 bit /symbol

3/81/81

1/83/80

10X2i+1X2i

Joint Prob P(X2i, X2i+1)

� Second order entropy:

Page 18: ENSC 424 - Multimedia Communications Engineering Huffman ...read.pudn.com/downloads84/ebook/324416/04_Huffman_1.pdf · J. Liang: SFU ENSC 424 9/13/2005 1 ENSC 424 - Multimedia Communications

9/13/2005J. Liang: SFU ENSC 424 18

Summary� Goal of entropy coding:

� Reduce the average codeword length (the entropy is the lower bound)

� Prefix-free code: uniquely decodable code� Kraft-McMillan Inequality:

� Characteristic of prefix-free code

� Huffman Code:� Optimal prefix-free code� Minimum variance code

� Next: � Canonical Huffman� Encoding and decoding