huffman coding - universiti putra malaysiacsnotes.upm.edu.my/.../$file/huffman_coding.pdf ·...
TRANSCRIPT
2
In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm
Present a procedure for building Huffman codes when the probability model for the source is known
A procedure for building codes when the source statistics are unknown
Describe a new technique for code design that are in some sense similar to the Huffman coding approach
Some applications
Overview
16
Huffman Coding (using binary tree)
Algorithm in 5 steps:
1. Find the grey-level probabilities for the image by finding the histogram
2. Order the input probabilities (histogram magnitudes) from smallest to largest
3. Combine the smallest two by addition
4. GOTO step 2, until only two probabilities are left
5. By working backward along the tree, generate code by alternating assignment of 0 and 1
17
Coding Procedures for an N-symbol source
Source reduction
List all probabilities in a descending order
Merge the two symbols with smallest probabilities into a new compound symbol
Repeat the above two steps for N-2 steps
Codeword assignment
Start from the smallest source and work back to the original source
Each merging point corresponds to a node in binary codeword tree
Huffman Coding (using binary tree)
Example 1
We have an image with 2 bits/pixel, giving 4
possible gray levels. The image is 10 rows by 10
columns. In step 1 we find the histogram for the
image.
18
Example 1
Converted into
probabilities by
normalizing to the total
number of pixels
Gray level 0 has 20 pixels
Gray level 1 has 30 pixels
Gray level 2 has 10 pixels
Gray level 3 has 40 pixels
19
a. Step 1: Histogram
Example 1
Step 4 repeats steps 2
and 3, where reorder (if
necessary) and add the
two smallest
probabilities.
22
d. Step 4: Reorder and
add until only two values
remain.
Example 1
Step 5, actual code assignment is made.
Start on the right-hand side of the tree and assign 0’s &
1’s
0 is assigned to 0.6 branch & 1 to 0.4 branch
23
Example 1
The assigned 0 & 1 are brought back along the tree &
wherever a branch occurs the code is put on both
branches
24
Example 1
Finally, the codes are brought back one more level, &
where the branch splits another assignment 0 & 1 occurs
(at 0.1 & 0.2 branch)
26
Example 1
Now we have Huffman code for this image
2 gray levels have 3 bits to represent & 1 gray level has 1 bit
assigned
Gray level represented by 1 bit, g3, is the most likely to occur
(40% of the time) & thus has least information in the
information theoretic sense.27
Exercise
Using the example 1, find a Huffman code
using the minimum variance procedure.
EE465: Introduction to Digital Image Processing 28
29
symbol x p(x)
S
W
N
E
0.5
0.25
0.125
0.1250.25
0.25
0.5 0.5
0.5
Example 2
Step 1: Source reduction
(EW)
(NEW)
compound symbols
30
p(x)
0.5
0.25
0.125
0.1250.25
0.25
0.5 0.5
0.5 1
0
1
0
1
0
codeword
0
10
110
111
Example 2
Step 2: Codeword assignment
symbol x
S
W
N
E
NEW 0
10EW
110
EW
N
S
01
1 0
1 0111
31
Example 2
NEW 0
10EW
110
EW
N
S
01
1 0
1 0
NEW 1
01EW
000
EW
N
S
10
0 1
1 0001
The codeword assignment is not unique. In fact, at each
merging point (node), we can arbitrarily assign “0” and “1”
to the two branches (average code length is the same).
or
32
symbol x p(x)
e
o
a
i
0.4
0.2
0.2
0.1
0.4
0.2
0.4 0.6
0.4
Example 2
Step 1: Source reduction
(iou)
(aiou)
compound symbolsu 0.1
0.2(ou)
0.4
0.2
0.2
33
symbol x p(x)
e
o
a
i
0.4
0.2
0.2
0.1
0.4
0.2
0.4 0.6
0.4
Example 2
(iou)
(aiou)
compound symbols
u 0.10.2(ou)
0.4
0.2
0.2
Step 2: Codeword assignment
codeword0
1
1
01
000
0010
0011
34
Example 2
0 1
0100
000 001
0010 0011
e
o u
(ou)i
(iou) a
(aiou)
binary codeword tree representation
35
Example 2
symbol x p(x)
e
o
a
i
0.4
0.2
0.20.1
u 0.1
codeword1
01
0000010
0011
length1
23
4
4
bpsppXHi
ii 122.2log)(5
1
2
bpslpli
ii 2.241.041.032.022.014.05
1
bpsXHlr 078.0)(
If we use fixed-length codes, we have to spend three bits per
sample, which gives code redundancy of 3-2.122=0.878bps
47
T
Stage 1 (First occurrence of t )
r
/ \
0 t(1)
Order: 0,t(1)
* r represents the root
* 0 represents the null node
* t(1) denotes the occurrence of T with a frequency of 1
49
TEN
Stage 3 (First occurrence of n )
r
/ \
2 t(1)
/ \
1 e(1)
/ \
0 n(1)
Order: 0,n(1),1,e(1),2,t(1) : Misfit
51
TENN
Stage 4 ( Repetition of n )
r
/ \
t(1) 3
/ \
2 e(1)
/ \
0 n(2)
Order: 0,n(2),2,e(1),t(1),3 : Misfit
52
Reorder: TENN
r
/ \
n(2) 2
/ \
1 e(1)
/ \
0 t(1)
Order: 0,t(1),1,e(1),n(2),2
t(1),n(2) are swapped
54
TENNES
Stage 6 (First occurrence of s)
r
/ \
n(2) 4
/ \
2 e(2)
/ \
1 t(1)
/ \
0 s(1)
Order: 0,s(1),1,t(1),2,e(2),n(2),4
55
TENNESS
Stage 7 (Repetition of s)
r
/ \
n(2) 5
/ \
3 e(2)
/ \
2 t(1)
/ \
0 s(2)
Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit
56
Reorder: TENNESS
r
/ \
n(2) 5
/ \
3 e(2)
/ \
1 s (2)
/ \
0 t(1)
Order : 0,t(1),1,s(2),3,e(2),n(2),5
s(2) and t(1) are swapped
57
TENNESSE
Stage 8 (Second repetition of e )
r
/ \
n(2) 6
/ \
3 e(3)
/ \
1 s(2)
/ \
0 t(1)
Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit
58
Reorder: TENNESSE
r
/ \
e(3) 5
/ \
3 n(2)
/ \
1 s(2)
/ \
0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(3),5
N(2) and e(3) are swapped
59
TENNESSEE
Stage 9 (Second repetition of e )
r
0/ \1
e(4) 5
0/ \1
3 n(2)
0/ \1
1 s(2)
0/ \1
0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(4),5
61
Average Code Length
Average code length = i=0,n (length*frequency)/ i=0,n frequency
= { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1)
= 18 / 9 = 2
62
ENTROPY
Entropy = - i=1,n (pi log2 pi)
= - ( 0.44 * log20.44 + 0.22 * log20.22
+ 0.22 * log20.22 + 0.11 * log20.11 )
= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)
/ log2
= 1.8367
63
Ordinary Huffman Coding
TENNESSE
9
0/ \1
5 e(4)
0/ \1
s(2) 3
0/ \1
t(1) n(2)
ENCODING
E : 1
S : 00
T : 010
N : 011
Average code length = (1*4 + 2*2 +
2*3 + 3*1) / 9 = 1.89
64
SUMMARY
The average code length of ordinary Huffman coding seems to be
better than the Dynamic version,in this exercise.
But, actually the performance of dynamic coding is better. The problem
with static coding is that the tree has to be constructed in the transmitter
and sent to the receiver. The tree may change because the frequency
distribution of the English letters may change in plain text technical paper,
piece of code etc.
Since the tree in dynamic coding is constructed on the receiver as well, it
need not be sent. Considering this, Dynamic coding is better.
Also, the average code length will improve if the transmitted text is
bigger.
65
Summary of Huffman Coding Algorithm
Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time
Sorting symbols in descending probabilities is the key in the step of source reduction
The codeword assignment is not unique. Exchange the labeling of “0” and “1” at any node of binary codeword tree would produce another solution that equally works well
Only works for a source with finite number of symbols (otherwise, it does not know where to start)