csc 461/561 multimedia systems part b: 1. lossless compression

23
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression

Upload: cleary

Post on 19-Mar-2016

43 views

Category:

Documents


3 download

DESCRIPTION

CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression. Summary. (1) Information (2) Types of compression (3) Lossless compression algorithms Shannon-Fano Algorithm Huffman coding Run-length coding LZW compression Arithmetic Coding (4) Example: Lossless image compression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

CSc 461/561Multimedia Systems Part B: 1. Lossless Compression

Page 2: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

Summary

(1) Information (2) Types of compression (3) Lossless compression algorithms

(a) Shannon-Fano Algorithm(b) Huffman coding(c) Run-length coding(d) LZW compression(e) Arithmetic Coding

(4) Example: Lossless image compression

Page 3: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

1. Information (1) Information is decided by three parts:

• The source• The receiver• The delivery channel

We need a way to measure information:• Entropy: a measure of uncertainty; min bits

– alphabet set {s1, s2, …, sn}

– probability {p1, p2, …, pn}

– entropy: - p1 log2 p1 - p2 log2 p2 - … - pn log2 pn

Page 4: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

1. Entropy examples (2)• Alphabet set {0, 1}• Probability: {p, 1-p}• Entropy: H = - p log2 p - (1-p) log2 (1-p)

– when p=0, H=0– when p=1, H=0– when p=1/2, Hmax=1

• 1 bit is enough!

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

p

Ent

ropy

Page 5: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

2. Types of compression (1)• Lossless compression: no information loss• Lossy compression: otherwise

Page 6: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

2. Compression Ratio (2)• Compression ratio

– B0: # of bits to represent before compression

– B1: # of bits to represent after compression

– compression ratio = B0/B1

Page 7: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.1 Shannon-Fano algorithm (1)• Fewer bits for symbols appear more often• “divide-and-conquer”

– also known as “top-down” approach– split alphabet set into subsets of (roughly) equal

probabilities; do it recursively– similar to building a binary tree

Page 8: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.1 Shannon-Fano: examples (2)

Page 9: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.1 Shannon-Fano: results (3)• Prefix-free code

– no code is a prefix of other codes– easy to decode

Page 10: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.1 Shannon-Fano: more results (4)• Encoding is not unique

– roughly equalEncoding 2

Encoding 1

Page 11: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.2 Huffman coding (1)• “Bottom-up” approach

– also build a binary tree• and know alphabet probability!

– start with two symbols of the least probability• s1: p1

• s2: p2

• s1 or s2: p1+p2

– do it recursively

Page 12: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.2 Huffman coding: examples (2)• Encoding not unique; prefix-free code• Optimality: H(S) <= L < H(S)+1

a2 (0.4)

a1(0.2)

a3(0.2)

a4(0.1)

a5(0.1)

Sort

0.2

combine Sort

0.4

0.2

0.2

0.2

0.4

combine Sort

0.4

0.2

0.40.6

combine

0.6

0.4

Sort

1

combine

Assign code

0

1

1

00

01

1

000

001

01

1

000

01

0010

0011

1

000

01

0010

0011

Page 13: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.3 Run-length coding

• Run: a string of the same symbol• Example

– input: AAABBCCCCCCCCCAA– output: A3B2C9A2– compression ratio = 16/8 = 2

• Good for some inputs (with long runs)– bad for others: ABCABC– how about to treat ABC as an alphabet?

Page 14: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.4 LZW compression (1)• Lempel-Ziv-Welch (LZ77, W84)

– Dictionary-based compression– no a priori knowledge on alphabet probability– build the dictionary on-the-fly– used widely: e.g., Unix compress

• LZW coding– if a word does not appear in the dictionary, add it– refer to the dictionary when the word appears again

Page 15: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.4 LZW examples (2)• Input

– ABABBABCABABBA• Output

– 1 2 4 5 2 3 4 6 1

Page 16: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.5 Arithmetic Coding (1) • Arithmetic coding determines a model of

the data -- basically a prediction of what patterns will be found in the symbols of the message. The more accurate this prediction is, the closer to optimality the output will be.

• Arithmetic coding treats the whole message as one unit.

Page 17: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.5 Arithmetic Coding (2)

Page 18: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.5 Arithmetic Coding (3)

Page 19: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.5 Arithmetic Coding (4)

Page 20: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

3.5 Arithmetic Coding (5)

Page 21: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

4. Lossless Image Compression (1)

Page 22: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

4. Lossless Image Compression (2)

Page 23: CSc 461/561 Multimedia Systems  Part B: 1. Lossless Compression

CSc 461/561

4. Lossless JPEG NNeighboring Pixels for Predictors

in Lossless JPEGNeighPredictors for Lossless JPEG