file compression techniques alex robertson. outline history lossless vs lossy basics huffman coding...
TRANSCRIPT
File Compression TechniquesAlex Robertson
Outline
History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future
History, where this all started The Problem! 1940s Shannon-Fano coding
Properties Different codes have different numbers of bits. Codes for symbols with low probabilities have more
bits, and codes for symbols with high probabilities have fewer bits.
Though the codes are of different bit lengths, they can be uniquely decoded.
Lossless vs Lossy
Lossless DEFLATE Data, every little detail is important
Lossy JPEG MP3 Data can be lost and unnoticed
Understanding the Basics Properties
Different codes have different numbers of bits. Codes for symbols with low probabilities have more
bits, and codes for symbols with high probabilities have fewer bits.
Though the codes are of different bit lengths, they can be uniquely decoded.
Encode “SATA”
S = 10 A = 0 T = 11
Prefix Rule S = 01 A = 0 T = 00
SATA SAAAA STT
010000
No code can be the prefix of another code.
If 0 is a code,0* can’t be a code.
Make a Tree
Create a Tree
A = 010B = 11C = 00D = 10R = 011
Decode
01011011010000101001011011010
A = 010B = 11C = 00D = 10R = 011
Violates the property:
Codes for symbols with low probabilities have more bits, and codes for symbols with high probabilities have fewer bits.
Huffman Coding
Create a Tree Encode “ABRACADABRA”
Determine Frequencies1. The two least frequent “nodes” are located. 2. A parent node is created from the two above nodes
and it is given a weight equal to the sum of the two contain node frequencies.
3. One of the child nodes is given the 0 bit and the other the 1 bit
4. Repeat the above steps until only one node is left.
Does it work?
Re-encode
01011011010000101001011011010 29 bits
It Works!
01011001110011110101100= 23 bits
ABRACADABRA= 11 character * 7 bits each= 77 bits
but…
It Works… With Issues.
Header includes the probability table
Not the best in certain cases
Example.‘A’ 100 times
Huffman only reduces this to 100 bits(minus the header)
Moving Forward
Arithmetic Method Not Specific Code Continuously changing single
floating-point output number
Example
“BILL GATES”Character Probability Range
SPACE 1/10 0.0 >= r > 0.1
A 1/10 0.1 >= r > 0.2
B 1/10 0.2 >= r > 0.3
E 1/10 0.3 >= r > 0.4
G 1/10 0.4 >= r > 0.5
I 1/10 0.5 >= r > 0.6
L 2/10 0.6 >= r > 0.8
S 1/10 0.8 >= r > 0.9
T 1/10 0.9 >= r > 1.0
Dictionary Based
Implemented in the late 70s Uses previously seen words as a
dictionary.
the quick brown fox jumped over the lazy dog
I bought a Mississippi Banana in Mississippi.
Lossy Compression
Lossy Formula Lossless Formula
My Sound!
Mathematical Limitations
Claude E. Shannon
http://www.data-compression.com/theory.html
Example
DEFLATE http://en.wikipedia.org/wiki/DEFLAT
E
Future
Hardware is getting better Theories are the same
Thanks You
Questions