huffman codes juan a. rodriguez cs 326 5/13/2003
TRANSCRIPT
![Page 1: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/1.jpg)
Huffman Codes
Juan A. RodriguezCS 326
5/13/2003
![Page 2: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/2.jpg)
Presentation Content Introduction Encoding Huffman’s algorithm Huffman’s code Dynamic Huffman encoding Quiz
![Page 3: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/3.jpg)
Introduction Suppose we have to encode a text that compromises of n
characters.
Huffman code is a coding scheme that yields a shorter bit string by applying the idea of assigning shorter codes words to more frequent characters and longer code words to less frequent characters.
Same idea that was used in the mid-19th century by Samuel Morse, where frequent letters such as e(.) and a(._) are assigned short sequence of dots and dashes while infrequent letters such as q (_ _ . _) and z (_ _ . .) have longer ones.
![Page 4: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/4.jpg)
Encoding Fixed Length
encoding assigns to each character a bit string of the same length.
That is what the standard seven-bit ASCII codes does
Letter CodewordA 000B 001C 010D 011E 100F 101
![Page 5: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/5.jpg)
Encoding Variable length
encoding assigns code words a different lengths to different characters
Huffman codes is an example of variable length encoding
Letter CodewordA 0B 101C 100D 111E 1101F 1100
![Page 6: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/6.jpg)
Encoding Prefix codes
No code word is a prefix of another code word
Simply scan a bit string until the first group of bits that is a code word for some character and repeat this operation until the bit string’s end is reached.
Simplifies encoding and decoding.
![Page 7: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/7.jpg)
Huffman’s Algorithm Suppose you have a 1000 character
data file with the following properties:
Character D I _ U E J L R O SFrequency 0.10 0.20 0.01 0.13 0.18 0.10 0.06 0.03 0.15 0.04
![Page 8: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/8.jpg)
Huffman’s Algorithm Step 1: Initialize n one-node trees
and label them with the character of the alphabet
I _ U E J L R O SD
![Page 9: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/9.jpg)
Huffman’s Algorithm Step 2: Record the frequency of
each character in its tree’s root to indicate the tree’s weight.
D
.1
I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04
![Page 10: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/10.jpg)
Huffman’s Algorithm Step 3: Find 2 trees with the
smallest weight and make them left and right sub-tree of a new tree and record the sum of their weights in the root of the new tree as it’s weight
D
.1
I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04
![Page 11: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/11.jpg)
Huffman’s Algorithm Step 3 continued
D
.1
I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
![Page 12: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/12.jpg)
Huffman’s Algorithm Step 3 continued
D
.1
I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
![Page 13: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/13.jpg)
Huffman’s Algorithm Step 3 continued
D
.1
I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14
![Page 14: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/14.jpg)
Huffman’s Algorithm Step 3 continued
D
.1I
.2
_
.01
U
.13E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14 .2
![Page 15: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/15.jpg)
Huffman’s Algorithm Step 3 continued
D
.1I
.2
_
.01
U
.13E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14
.2 .27
![Page 16: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/16.jpg)
Huffman’s Algorithm Step 3 continued
D
.1I
.2
_
.01
U
.13E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14
.2 .27 .33
![Page 17: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/17.jpg)
Huffman’s Algorithm Step 3 continued
D
.1I
.2
_
.01
U
.13E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14 .2
.27 .33 .40
![Page 18: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/18.jpg)
Huffman’s Algorithm Step 3 continued
D
.1I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14.2
.27 .33.40
.60
![Page 19: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/19.jpg)
Huffman’s Algorithm Step 3 cont
D
.1I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14.2
.27 .33.40
.601
![Page 20: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/20.jpg)
Huffman’s Algorithm Step 4: We take the convention
that going left down the binary tree means adding a 0, and going right down the binary tree means adding a 1.
![Page 21: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/21.jpg)
Huffman’s Algorithm Step 4 cont
D
.1I
.2
_
.01
U
.13
E
.18
J
.1
L
.06
R
.03
O
.15
S
.04.04
.08
.14.2
.27 .33.40
.6010
0 1
0
0
0
0
0
0
0
1
1
1
1
1
11
1
![Page 22: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/22.jpg)
Huffman’s Algorithm The algorithm is greedy, which
means that it makes choices that are locally optimal and hopes it yields a globally optimal solution.
Notice that this is a full binary tree: every non-leaf node has two children. This is true of all optimal codes.
![Page 23: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/23.jpg)
Huffman’s Algorithm The operation that we need to
perform repeatedly is the extraction of the two sub-trees with the smallest frequencies. This can be implemented using a priority queue.
![Page 24: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/24.jpg)
Example of Huffman’s algorithm code implementation ML.
The operation that we need to perform repeatedly is the extraction of the two sub-trees with the smallest frequencies. This can be implemented using a priority queue.
Building the initial queue takes time O(n log n) since each enqueue operations takes O(log n).
Then we perform n-1 merges, each of which takes O(log n). Thus this implementation of Huffman’s algorithm takes O(n long n).
datatype HTree = Leaf of char * int | Branch of HTree * int * Htree
fun huffmanTree(alpha : (char * int) list) : HTree = let val alphasize = length(alpha) fun freq(node:HTree):int = case node of Leaf(_,i) => i | Branch(_,i,_) => i
val q = new_heap (fn (x,y) => Int.compare(freq x, freq y)) alphasize fun merge(i:int):HTree = if i = 0 then extract_min(q) else let val x = extract_min(q)
val y = extract_min(q) in insert q (Branch(x, freq(x)
+freq(y), y)); merge(i-1) end
in app (fn (c:char,i:int):unit => insert q (Leaf(c,i))) alpha; merge(alphasize-1)
end
![Page 25: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/25.jpg)
Huffman’s Code The output of Huffman’s
algorithms is Huffman’s code:Charcter Frequency Codeword
D 0.10 010I 0.20 00_ 0.01 101100U 0.13 100E 0.18 111J 0.10 011L 0.06 1010R 0.03 101101O 0.15 110S 0.04 10111
![Page 26: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/26.jpg)
Huffman’s Code Encoding of LORI:
1010 110 101101 00 Given the probabilities
and codeword length the expected bits per character in the code is 2.66.
Had we used a fixed-length encoding for the same alphabet, we would use at least 4 bits per character
Charcter Frequency CodewordD 0.10 010I 0.20 00_ 0.01 101100U 0.13 100E 0.18 111J 0.10 011L 0.06 1010R 0.03 101101O 0.15 110S 0.04 10111
![Page 27: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/27.jpg)
Huffman’s Code This code achieves the compression
ratio, a standard measure of the compression algorithm’s effectiveness, of 35.5%.
Huffman’s encoding of this text will use 35.5% less memory than it’s fixed length encoding.
Extensive experiments with Huffman’s codes have shown the compression ratio for this scheme typically falls between 20% and 80%.
![Page 28: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/28.jpg)
Dynamic Huffman Encoding Huffman’s encoding yields an optimal
(minimal length) encoding providing the probabilities of characters occurrences are know in advance.
Draw back: a preliminary scanning of a given text to count the frequencies of the character occurrences in it. We use the algorithm to compute the an optimal prefix tree, and we scan the text a SECOND time, writing the out the code words of each character of the text.
![Page 29: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/29.jpg)
Dynamic Huffman Encoding Dynamic (Adaptive) Huffman
coding builds a tree incrementally in such a way that the coding always is optimal for the sequence of characters already seen.
![Page 30: Huffman Codes Juan A. Rodriguez CS 326 5/13/2003](https://reader036.vdocument.in/reader036/viewer/2022070412/5697bf821a28abf838c859e4/html5/thumbnails/30.jpg)
Huffman’s Code Quiz Given the
following bit string, what does it decode into?
011111101111001011110110000101111011001010110101101010
Charcter Frequency CodewordD 0.10 010I 0.20 00_ 0.01 101100U 0.13 100E 0.18 111J 0.10 011L 0.06 1010R 0.03 101101O 0.15 110S 0.04 10111