ensc 424 - multimedia communications engineering topic...

21
9/18/2005 J. Liang: SFU ENSC 424 1 ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University [email protected]

Upload: hoangdien

Post on 21-Mar-2018

226 views

Category:

Documents


6 download

TRANSCRIPT

9/18/2005J. Liang: SFU ENSC 424 1

ENSC 424 - Multimedia Communications EngineeringTopic 4: Huffman Coding 2

Jie LiangEngineering Science

Simon Fraser [email protected]

9/18/2005J. Liang: SFU ENSC 424 2

Outline

�Canonical Huffman code�Huffman encoding�Huffman decoding�Limitations of Huffman code�Context-adaptive Huffman coding

9/18/2005J. Liang: SFU ENSC 424 3

Canonical Huffman Code

� Huffman codes for a given data set is not unique

� Huffman algorithm is needed only to compute the optimal codeword lengths

� Canonical Huffman code is well structured� Given the codeword lengths, can find a

canonical Huffman code

http://portal.acm.org/citation.cfm?id=77566

9/18/2005J. Liang: SFU ENSC 424 4

Canonical Huffman Code

� Example:� Codeword lengths: 2, 2, 3, 3, 3, 4, 4� Verify that it satisfies Kraft-McMillan inequality

01

100

11

000 001

1010 1011

A non-canonical example

00 01

The Canonical Tree

� Rules:� Assign 0 to left branch and 1 to right branch� Build the tree from left to right in increasing order of depth� Each leaf is placed at the first available position

110100 101

10

1110 1111

111

121

≤∑=

−N

i

li

9/18/2005J. Liang: SFU ENSC 424 5

Canonical Huffman� Properties:

� The first code is a series of 0� Codes of same length

are consecutive: 100, 101, 110� If we pad zeros to the right side such that all codewords have the same length,

shorter codes would have lower value than longer codes:

0000 < 0100 < 1000 < 1010 < 1100 < 1110 < 1111

00 01

110100 101

1110 1111

� If from length n to n + 2 directly: e.g., 1, 3, 3, 3, 4, 4

C(n+2, 1) = 4( C(n, last) + 1)

0

110100 101

1110 1111

First code of length n+1

Last code of length n

� Formula from level n to level n+1:� C(n+1, 1) = 2 ( C(n, last) + 1): append a 0 to the next available level-n code

9/18/2005J. Liang: SFU ENSC 424 6

Advantages of Canonical Huffman1. Reducing memory requirement

� Non-canonical tree needs:�All codewords�Lengths of all codewords

� Need a lot of space for large table

01

100

11

000 001

1010 1011

00 01110100 101

1110 1111

� Canonical tree only needs:� Min: shortest codeword length� Max: longest codeword length� Distribution:

� Number of codewords in each level

� Min=2, Max=4, # in each level: 2, 3, 2

9/18/2005J. Liang: SFU ENSC 424 7

Advantages of Canonical Huffman

2. Simplifying the decodingSee Hirschberg & Lelewer’s paper for detailshttp://portal.acm.org/citation.cfm?id=77566(Not required in this course)

�Final note about Canonical Huffman:Different conventions of canonical form exist:�Can be built from the right to the left (exchanging

0 and 1). This convention is used in, e.g., “Practical Huffman coding” athttp://www.compressconsult.com/huffman

9/18/2005J. Liang: SFU ENSC 424 8

Outline�Review of last lecture�Huffman code�Canonical Huffman code�Huffman Encoding�Huffman Decoding�Limitations of Huffman code�Context-adaptive Huffman coding

9/18/2005J. Liang: SFU ENSC 424 9

Computer Implementation

� Need codewords: 8 bits, 16 bits, 32 bits …unsigned char Codewords[5] = {1, 1, 0, 2, 3};

�Also need to store length of each codes� unsigned char Codelength[5] = {2, 1, 3, 4, 4};

� Computer does not know the number of leading zeros.� 1 and 01 are different codes, 0 and 000 are different codes

�How to store the Huffman table?

�Decoder also needs to store this table.

Code: a: 01, b: 1, c: 000, d: 0010, e: 0011

9/18/2005J. Liang: SFU ENSC 424 10

Encoding

� Source alphabet A = {a, b, c, d, e}� Probability distribution: {0.2, 0.4, 0.2, 0.1, 0.1}� Code: a: 01, b: 1, c: 000, d: 0010, e: 0011� Sequence: abaddecade

� For each input symbol

Find its codeword

Output all bits of the codeword

End

� Needs bit-level operations: byte pointer, bit pointer

Compressed bitstream:01101001 00010001 10000100 100011

30 bits: 3 bits / symbolASCII: 10 bytes (80 bits)Compression ratio: 2.67 : 1

9/18/2005J. Liang: SFU ENSC 424 11

Outline�Review of last lecture�Huffman code�Canonical Huffman code�Huffman Encoding�Huffman Decoding

�Direct Approach�Table Look-up Method�Multilevel Table Look-up Method

�Limitations of Huffman code�Context-adaptive Huffman coding

9/18/2005J. Liang: SFU ENSC 424 12

Huffman Decoding� Direct Approach:

� Read one bit, compare with all codewords…� Slow

�Binary tree approach:�Embed the Huffman table into a binary tree data

structure�Read one bit:

� if it’s 0, go to left child. � If it’s 1, go to right child. � Decode a symbol when a leaf is reached.

�Still a bit-by-bit approach

9/18/2005J. Liang: SFU ENSC 424 13

Table Look-up� N: # of codewords� L: max codeword length� Expand to a full tree:

� Each Level-L node belongs to the subtree of a codeword.� Equivalent to dividing the range [0, 2^L] into N intervals, each

corresponding to one codeword.

000 010 011 100

� Read L bits, and find which internal it belongs to:Bar [4] = {010, 011, 100, 1000};

x = ReadBits (3);For (k = 0; k <= 3) {

if (x < Bar [k]) {Decode codeword k; break;}}

� Still needs conditional operators: slow

1

00

010 011

9/18/2005J. Liang: SFU ENSC 424 14

Table Look-up Method

000 010 011 100

a

b c

d

a: 00b: 010c: 011d: 1

char HuffDec[8][2] = {

{‘a’, 2},

{‘a’, 2},

{‘b’, 3},

{‘c’, 3},

{‘d’, 1},

{‘d’, 1},

{‘d’, 1},

{‘d’, 1}

};

x = ReadBits(3);

k = 0; //# of symbols decoded

While (not EOF) {

symbol[k++] = HuffDec[x][0];

length = HuffDec[x][1];

x = x << length;

newbits = ReadBits(length);

x = x | newbits;

x = x & 7;

}

9/18/2005J. Liang: SFU ENSC 424 15

Multi-level Table Look-Up� Look-up table size:

� 2^L� L=8: 256 entries� L=16: 65536 entries!

�Multi-level Table Look-Up�Divide-and-conquer�One for frequently

appeared codewords�Others for longer codewords

00 01110100 101

� Most codes can be decoded by looking at a few bits

� Long codes rarely appear16 entries

L = 8

� One-level: 256 entries� Two-level: 32 entries !

16 entries

9/18/2005J. Liang: SFU ENSC 424 16

More Complicated Cases

00 01100 101

�More than one hole in the first table:�Need one 2nd-layer table for each hole.

Table 1

Table 2

Table 3

9/18/2005J. Liang: SFU ENSC 424 17

Outline

�Huffman code�Canonical Huffman code�Huffman Decoding�Limitations of Huffman Code�Context-adaptive Huffman coding

9/18/2005J. Liang: SFU ENSC 424 18

Limitations of Huffman Code� Need a probability distribution

� Usually estimated from a training set� But the practical data could be quite different

� Hard to adapt to changing statistics� Must design new codes on the fly� Context-adaptive method still need predefined tables

� Minimum codeword length is 1 bit� Serious penalty for high-probability symbols

� Example: Binary source, P(0)=0.9� Entropy: -0.9*log2(0.9)-0.1*log2(0.1) = 0.469 bit� Huffman code: 0, 1 � Avg. code length: 1 bit� More than 100% redundancy !!!� Joint coding is not practical for large alphabet.

9/18/2005J. Liang: SFU ENSC 424 19

Outline

�Huffman code�Canonical Huffman code�Huffman Decoding�Limitations of Huffman Code�Context-adaptive Huffman coding

9/18/2005J. Liang: SFU ENSC 424 20

Context-adaptive Huffman code� Switch Huffman table based on previously encoded information

(context)� Decoder follows the same rules.

� Used in H.264 (revisited later)

� Example:� Binary source with P(X2i, X2i+1):� P(0, 0) = 3/8, P(0, 1) = 1/8� P(1, 0) = 1/8, P(1, 1) = 3/8� Encode 2 symbols together

� Code from previous example:1, 00, 010, 011

� 0 and 1 tend to appear in cluster:Use last two symbols (context) to predict the next two.Design one set of Huffman code for each possible context value.

x0 x1 x2 x3 x4 x5 x6 x7 … …

111100110(1,0)

110010111(0,1)

010110111(1,1)

111110100(0,0)

(1,1)(1,0)(0,1)(0,0)

CodewordsContext

9/18/2005J. Liang: SFU ENSC 424 21

Summary�Huffman code generation:

�Sort, merge assign code

�Canonical Huffman code�Left to right, short to long, take first valid leaf

�Huffman Decoding�Direct, tree, table look-up, multi-level table look-up

�Limitations of Huffman Coding�Adaptive is hard, not efficient

�Next: Golumb-Rice coding

Arithmetic coding