text compression huffman coding
DESCRIPTION
Text Compression Huffman Coding. James Adkison 02/07/2008. Assumptions / Givens. A bit is represented by a ‘1’ or ‘0’ A byte is any combination of 8 bits All ASCII characters are stored in 1 byte, except the ‘\n’ character which is stored as two bytes the ‘\n’ and ‘\r’. Notation. - PowerPoint PPT PresentationTRANSCRIPT
Text CompressionText CompressionHuffman CodingHuffman Coding
James AdkisonJames Adkison
02/07/200802/07/2008
Assumptions / GivensAssumptions / Givens
• A bit is represented by a ‘1’ or ‘0’A bit is represented by a ‘1’ or ‘0’
• A byte is any combination of 8 bitsA byte is any combination of 8 bits
• All ASCII characters are stored in 1 All ASCII characters are stored in 1 byte, except the ‘\n’ character which byte, except the ‘\n’ character which is stored as two bytes the ‘\n’ and ‘\r’is stored as two bytes the ‘\n’ and ‘\r’
NotationNotation
• Square brackets ‘[’ ‘]’ are range Square brackets ‘[’ ‘]’ are range inclusiveinclusive
• Parenthesis ‘(’ ‘)’ are range exclusiveParenthesis ‘(’ ‘)’ are range exclusive
• Example: [0, 6) includes 0 and excludes Example: [0, 6) includes 0 and excludes 6 so the range is 0 to 5 or [0, 5]6 so the range is 0 to 5 or [0, 5]
• Traversing a Hoffman Tree to the left Traversing a Hoffman Tree to the left produces a ‘0’ bit and the right produces a ‘0’ bit and the right produces a ‘1’ bitproduces a ‘1’ bit
DefinitionsDefinitions
• Bit string: any combination of two or Bit string: any combination of two or more bitsmore bits
• Text = ASCII text = Uncompressed Text = ASCII text = Uncompressed text = Decoded texttext = Decoded text
• Encoded text = Huffman encoding = Encoded text = Huffman encoding = Compressed textCompressed text
Definitions Continued…Definitions Continued…
• Leaf Node: Has 1 parent and [0, 1) Leaf Node: Has 1 parent and [0, 1) childrenchildren
• Non-leaf Node: Has 1 parent and [1, Non-leaf Node: Has 1 parent and [1, 2] children2] children
• Root Node: Has 0 parents and [0, 2] Root Node: Has 0 parents and [0, 2] childrenchildren
2 5
1
6
3 4
10
98
11
7
Root Node Non-leaf Node Leaf Node
Binary Binary TreeTree
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’ ’010’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’ ’010’ ‘011’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’ ’010’ ‘011’ ‘101’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’ ’010’ ‘011’
’11’
‘101’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
‘000’ ’010’ ‘011’
’11’
‘101’
Root Node Non-leaf Node Leaf Node
HuffmaHuffman Treen Tree0 1
0
0 0
0 11
11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
HuffmaHuffman Treen Tree
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
1: w
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
11: ww
11: q
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
111: www
111: wq
111: qw
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
111: www
111: wq
111: qw
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
HuffmaHuffman Treen Tree
Decode: 1110110000
111: www
111: wq
111: qw
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
BadBadHuffmaHuffman Treen Tree
‘t’
‘y’
‘e’
‘w’
‘r’ ‘q’
0
10
1
10
‘y’ : 0
‘w’ : 1
‘t’ : 00
‘e’ : 01
‘r’ : 10
‘q’ : 11
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 61 bytes
Compression: 79.7 percent
BadBadHuffmaHuffman Treen Tree
Encode: q w e r t y
Code: 11 1 01 10 00 0
Huffman Tree ConstructionHuffman Tree Construction
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1 ‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1 ‘w’,1‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1 ‘w’,1 ‘t’,1‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1 ‘w’,1 ‘t’,1 ‘e’,1‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,1 ‘w’,1 ‘t’,1 ‘r’,1‘e’,1‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,2 ‘w’,1 ‘t’,1 ‘r’,1‘e’,1‘q’,1
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55
Huffman Tree Construction:Huffman Tree Construction:Process Text File & Build Array of Process Text File & Build Array of NodesNodes
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55
Each distinct character onlyappears in the array once along
with the # of times it occurs
Huffman Tree Construction:Huffman Tree Construction:Sort the arraySort the array
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘r’,47 ‘q’,55 ‘w’,58‘y’,57‘e’,43
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘r’,47 ‘q’,55 ‘w’,58‘y’,57‘e’,43
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘r’,47 ‘q’,55 ‘w’,58‘y’,57
‘e’,43
83
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘r’,47 ‘q’,55 ‘w’,58‘y’,57
‘e’,43
83
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘r’,47 ‘q’,55 ‘w’,58‘y’,57
‘e’,43
83
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘w’,58‘y’,57
‘e’,43
83
‘r’,47 ‘q’,55
102
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘w’,58‘y’,57
‘e’,43
83
‘r’,47 ‘q’,55
102
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40
‘w’,58‘y’,57
‘e’,43
83
‘r’,47 ‘q’,55
102
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102
‘y’,57 ‘w’,58
115
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102
‘y’,57 ‘w’,58
115
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102
‘y’,57 ‘w’,58
115
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102
‘y’,57 ‘w’,58
115185
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102‘y’,57 ‘w’,58
115 185
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102‘y’,57 ‘w’,58
115 185
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102‘y’,57 ‘w’,58
115 185
300
Huffman Tree Construction:Huffman Tree Construction:Constructing the treeConstructing the tree
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
‘t’,40 ‘e’,43
83
‘r’,47 ‘q’,55
102‘y’,57 ‘w’,58
115 185
300
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
yqwteryqreytqywteweryqtwreqwewtryqwetytqrwqrtwtwqyqtwreqqywywytrwtqywqyqyewrqwyqwyrewqytqwtyrwyeqyrweytrtryytwrwqererrwtwqtretyytwtreryqqwewqywterqwyyqyqtweyqwreywqryqwreyqytqweytqweyqwreyqweyqwreyqwteryqwreyqwteryqwteryqwreyqwtreyqwtreyqwreyqwtreyqtwtryrereyqyqtwyweyrerrtqrwrwetqtrqywretqwwytqeyqwe
Input File:
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Huffman Tree:Huffman Tree:Compression ComputationCompression Computation
•NN : # of distinct characters : # of distinct characters•nnii : # of times a character at the i : # of times a character at the ithth
index occursindex occurs•bbii : # of bits used to encode a : # of bits used to encode a
character at the icharacter at the ithth index index• 8 : # of bits stored in a byte8 : # of bits stored in a byte
•
1
0
8/N
iii bn ‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55
Compression Computation Compression Computation ExampleExample
•
• Produces:Produces:(57 * 2) + (55 * 3) + (58 * 2) + (40 * 3) + (43 * 3) + (47 * 3)(57 * 2) + (55 * 3) + (58 * 2) + (40 * 3) + (43 * 3) + (47 * 3)
88
= 98.125 -> 99 bytes= 98.125 -> 99 bytes
1
0
8/N
iii bn ‘y’,57 ‘w’,58 ‘t’,40 ‘r’,47‘e’,43‘q’,55
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer:
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer:
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer:
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: q
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: q
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: q
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qw
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qw
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qw
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qw
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwe
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwe
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwe
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwe
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwer
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwer
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwer
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwer
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwert
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwert
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwert
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwerty
‘y’,57
115
‘w’,58
‘t’,40
185
‘r’,47
83
300
‘e’,43 ‘q’,55
102
0
10
1
1
11
0
0
‘y’ : 00
‘w’ : 01
‘t’ : 100
‘e’ : 101
‘r’ : 110
‘q’ : 111
0
Huffman
Encoding:
Uncompressed: 300 bytes
Compressed: 99 bytes
Compression: 67 percent
HuffmaHuffman Treen Tree
Decode: 1110110111010000
Answer: qwerty
Huffman EncodingHuffman EncodingReal World ExampleReal World Example
• Huffman (C++)Huffman (C++)
• Text I/O DirectoryText I/O Directory
Huffman HomeworkHuffman Homework
1.1. Construct a Huffman tree for the Construct a Huffman tree for the following input file. You only need to following input file. You only need to show the final tree. These are the only show the final tree. These are the only characters in the file and the # of times characters in the file and the # of times they occur: (a, 50)(b, 60)(c, 70)(d, 80)they occur: (a, 50)(b, 60)(c, 70)(d, 80)
2.2. How many bytes will the compressed How many bytes will the compressed text file occupy?text file occupy?
Works CitedWorks Cited
• ““ASCII.” ASCII.” Wikipedia The Free EncyclopediaWikipedia The Free Encyclopedia. 2008. . 2008. 21 January 2008. <21 January 2008. <http://http://en.wikipedia.orgen.wikipedia.org/wiki/ASCII/wiki/ASCII>>
• Dewdney, A. K. Dewdney, A. K. The New Turing OmnibusThe New Turing Omnibus. New . New York: Henry Holt, 1989. Pages 345 – 350.York: Henry Holt, 1989. Pages 345 – 350.
• Line Termination: Operating Systems Use Line Termination: Operating Systems Use Different ConventionsDifferent Conventions. 21 January 2008. <. 21 January 2008. <http://homepage.smc.edu/morgan_david/CS41/linhttp://homepage.smc.edu/morgan_david/CS41/lineterminators.htmeterminators.htm>>