csc 2300 data structures & algorithms
DESCRIPTION
CSC 2300 Data Structures & Algorithms. April 27, 2007 Chap. 10. Algorithm Design Techniques. Today. File Compression Huffman Code. ASCII. What does ASCII stand for? The ASCII character set consists of about 100 “printable” characters. How many bits to represent these characters? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/1.jpg)
CSC 2300Data Structures & Algorithms
April 27, 2007
Chap. 10. Algorithm Design Techniques
![Page 2: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/2.jpg)
Today
File Compression Huffman Code
![Page 3: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/3.jpg)
ASCII
What does ASCII stand for? The ASCII character set consists of about 100
“printable” characters. How many bits to represent these characters? The set includes some “nonprintable” characters. An 8th bit is added as a parity bit.
![Page 4: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/4.jpg)
Example
A file with only the characters a, e, i, s, t, blankspace, newline. There are seven characters, and so three bits are sufficient.
i see a seat 010101011001001101000101011001000100110 (39 bits) How to do better?
![Page 5: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/5.jpg)
Binary Tree
Binary tree:
The data reside only at the leaves. Can you improve this representation?
![Page 6: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/6.jpg)
Example
newline becomes 11 i see a seat 01010101100100110100010101100100010011 (38 bits) A reduction of 1 bit. Want more significant improvement. How?
![Page 7: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/7.jpg)
The Two Trees
What can you say about the structure of the better tree? It a a full tree. All nodes either are leaves or have two children. An optimal code will always have this property. Why? Nodes with only one child can always move up one level.
![Page 8: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/8.jpg)
Prefix Code
If the characters are placed only at the leaves, the given sequence of bits can be decoded unambiguously.
Prefix code: no character code is a prefix of another character code.
Example: 01001111000010110001000111 What is it? is
a tie
![Page 9: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/9.jpg)
Optimal Prefix Code
Binary tree:
How to find optimal code?
![Page 10: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/10.jpg)
Our Example
i see a seat 1011000000101110011100000010010001 (34 bits) The code in the table is not optimal for our example. Why not? Exercise. Find the optimal code for our example.
![Page 11: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/11.jpg)
Huffman’s Algorithm
Assume that there are C characters. Maintain a forest of trees. The weight of a tree is equal to the sum of the frequencies
of its leaves. For C – 1 times, select the two trees T1 and T2 of smallest
weights, breaking ties arbitrarily, and form a new tree with subtrees T1 and T2.
At the beginning, there are C single-node trees. At the end, there is one single tree, which is the optimal Huffman coding tree.
![Page 12: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/12.jpg)
Example
Initial stage:
After first merge:
![Page 13: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/13.jpg)
Example
After first merge:
After second merge:
After third merge:
![Page 14: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/14.jpg)
Example
After third merge:
After fourth merge:
![Page 15: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/15.jpg)
Example
After fourth merge:
After fifth merge:
![Page 16: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/16.jpg)
Example
After fifth merge:
After final merge:
![Page 17: CSC 2300 Data Structures & Algorithms](https://reader030.vdocument.in/reader030/viewer/2022020219/56813e78550346895da89b40/html5/thumbnails/17.jpg)
Implementation
If we maintain the trees in a priority queue, ordered by weight, what is the running time?
O( C log C ). We say that Huffman’s method is a two-pass
algorithm. What are the two passes? The first pass selects the frequency data and
the second pass performs the encoding.