data compression michael j. watts
TRANSCRIPT
![Page 1: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/1.jpg)
Data Compression
Michael J. Watts
http://mike.watts.net.nz
![Page 2: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/2.jpg)
Lecture Outline
What is compression? Huffman encoding Lempel-Ziv encoding Fraunhofer compression JPEG compression
![Page 3: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/3.jpg)
What is Compression?
Reduction of the length of symbols used to represent data Symbols can be anything
Based on redundancy Information theory
Lossless vs lossy Degree of compression depends on amount of
redundancy Random data compression?
![Page 4: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/4.jpg)
What is Compression?
Represent symbols with codewords One symbol == one codeword Length of codeword < length of symbol
Many algorithms exist Huffman Lempel-Ziv Fraunhofer
![Page 5: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/5.jpg)
Huffman Encoding
Based on probability (frequency) of symbols Assign codewords as bits Builds a binary tree based on probability
Frequent / high probability symbols at the top Infrequent / low probability symbols lower down Branches labelled with 0 and 1
Frequent symbols get short bit strings Infrequent symbols get longer bit strings
![Page 6: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/6.jpg)
Huffman Encoding Algorithm
Arrange symbols in order of decreasing probability Combine two symbols with lowest probability
Assign a new name, add their probabilities
Rebuild the list Combine the next two lowest symbols Repeat until there is one symbol, with probability of one Build a binary tree Form codewords by tracing down the tree to the symbol
![Page 7: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/7.jpg)
Huffman Encoding
Decompression is the opposite of compression Follow bit sequence to a terminal node
Needs to see the entire data set Must know symbol probabilities
Also combined with other techniques MP3 encoding
![Page 8: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/8.jpg)
Lempel-Ziv
Dictionary based encoder Lossless Replaces phrases with tokens
Tokens smaller than the phrases Converts input stream to compressed output
stream Doesn't need to see entire data set
Dictionary is built as it moves through the stream
![Page 9: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/9.jpg)
Lempel-Ziv
Separates input stream into tokens Each token represents the shortest phrase that has
not been seen Tokens are numbered Tokens contain other tokens Compression is inefficient at the start Better later
Larger dictionary
![Page 10: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/10.jpg)
Fraunhofer / MP3 Encoding
Lossy algorithm Compression of audio data Perceptual encoding
Psycho-acoustic model Throws out what can't be heard
Frequency bands Masking effects
Output is further compressed Uses Huffman encoding
![Page 11: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/11.jpg)
MP3 Encoding
Only usable on audio Part of the MPEG standard
DVDs Rather widely used by itself
![Page 12: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/12.jpg)
JPEG Compression
Joint Photographic Experts Group Used to compress images Compresses groups of pixels Uses a Discrete Cosine Transformation
DCT Quantises the results of the DCT
Lossy! Further compression is applied
![Page 13: Data Compression Michael J. Watts](https://reader036.vdocument.in/reader036/viewer/2022082816/56649d215503460f949f6cc9/html5/thumbnails/13.jpg)
Summary
Compression is an application of information theory
Encodes long strings as shorter strings All schemes depend on redundant / repetitive data Common gotchas
A compression algorithm cannot compress its own output
Cannot easily compress completely random data No / little redundancy