lecture vi: jpeg image compression · lecture vi: lossless compression algorithms dr. ouiem bchir...

31
LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

LECTURE VI

LOSSLESS COMPRESSION

ALGORITHMS

DR OUIEM BCHIR

1

STORAGE SPACE

Uncompressed graphics audio and video data require

substantial storage capacity

Storing uncompressed video is not possible with todayrsquos

technology (CD amp DVD)

Data transmission of uncompressed video over digital

networks require very high bandwidth

To be cost-effective and feasible multimedia systems must use

compressed video and audio streams

2

INTRODUCTION

Compression the process of coding that will effectively reduce the total

number of bits needed to represent certain information

3

General Data Compression Scheme

INTRODUCTION

If the compression and decompression processes induce no

information loss then the compression scheme is lossless

otherwise it is lossy

Compression ratio

4

COMPRESSION STEPS

5

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 2: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

STORAGE SPACE

Uncompressed graphics audio and video data require

substantial storage capacity

Storing uncompressed video is not possible with todayrsquos

technology (CD amp DVD)

Data transmission of uncompressed video over digital

networks require very high bandwidth

To be cost-effective and feasible multimedia systems must use

compressed video and audio streams

2

INTRODUCTION

Compression the process of coding that will effectively reduce the total

number of bits needed to represent certain information

3

General Data Compression Scheme

INTRODUCTION

If the compression and decompression processes induce no

information loss then the compression scheme is lossless

otherwise it is lossy

Compression ratio

4

COMPRESSION STEPS

5

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 3: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

INTRODUCTION

Compression the process of coding that will effectively reduce the total

number of bits needed to represent certain information

3

General Data Compression Scheme

INTRODUCTION

If the compression and decompression processes induce no

information loss then the compression scheme is lossless

otherwise it is lossy

Compression ratio

4

COMPRESSION STEPS

5

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 4: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

INTRODUCTION

If the compression and decompression processes induce no

information loss then the compression scheme is lossless

otherwise it is lossy

Compression ratio

4

COMPRESSION STEPS

5

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 5: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

COMPRESSION STEPS

5

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 6: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

TYPES OF COMPRESSION

Symmetric Compression

bull Same time needed for decoding and encoding phases

Asymmetric Compression

bull Compression process is performed once and enough

bull time is available hence compression can take longer

bull Decompression is performed frequently and must be

bull done fast

6

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 7: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

STATISTICAL ENCODING

(FREQUENCY DEPENDENT)

Fixed length coding

bull Use equal number of bits to represent each symbol - message of N

bull symbols requires L gt= log_2(N) bits per symbol

bull Good encoding for symbols with equal probability of occurrence Not

efficient if probability of each symbol is not equal

Variable length encoding

bull frequently occurring characters represented with shorter strings than

bull seldom occurring characters

bull Statistical encoding is dependant on the frequency of occurrence of a

bull character or a sequence of data bytes

bull You are given a sequence of symbols S1 S2 S3 and the probability

bull of occurrence of each symbol P(Si) = Pi

7

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 8: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

BASICS OF

INFORMATION THEORY

The entropy η of an information source with alphabet S = s1s2hellipsn

is

pi probability that symbol si will occur in S

Log2(1pi) amount of information contained in si which

corresponds to the number of bits needed to encode si

8

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 9: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

EXAMPLE

Uniform distribution pi=1256 hence the entropy of the image is

log2256=8

9

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 10: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

ENTROPY AND CODE LENGTH

Entropy and Code Length

The entropy η is weighted-sum of terms log2(1pi)

It represents the average amount of information

contained per symbol in the source S

The entropy η specifies the lower bound for the

average number of bits to code each symbol in S ie

The average length (measured in bits) of the codewords

produced by the encoder

10

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 11: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

RUN-LENGTH CODING

Memoryless Source An information source that is

independently distributed the value of the current

symbol does not depend on the values of the

previously appeared symbols

Run-Length Coding (RLC) (not memoryless)

exploits memory present in the information source

bull Rational for RLC If the information source has the property

that symbols tend to form continuous groups then such

symbol and the length of the group can be coded

11

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 12: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

RUN-LENGTH CODING

(RLC)

Content dependent coding

1048698 RLC replaces the sequence of same consecutive

bytes with the number of occurrences

bull The number of occurrences is indicated by a special flag - ldquordquo

1048698 RLC Algorithm

bull If the same byte occurred at least 4 times then count the number of occurrences

bull Write compressed data in the following format ldquothe counted bytenumber of occurrencesrdquo

1048698 Example

bull Uncompressed sequence - ABCCCCCCCCCDEFFFFGGG

bull Compressed sequence - ABC4DEF0GGG (from 20 to 13 bytes)

12

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 13: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

VARIABLE-LENGTH

CODING (VLC)

1048698 Shannon-Fano Algorithm a top-down approach

1 Sort the symbols according to the frequency count of

their occurrences

2 Recursively divide the symbols into two parts each

with approximately the same number of counts until all

parts contain only one symbol

1048698 Example coding of ldquoHELLOrdquo

13

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 14: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

14

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 15: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

15

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 16: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

Another coding tree for ldquoHELLOrdquo by Shannon-Fano

16

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 17: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

17

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 18: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

HUFFMAN CODING

ALGORITHM

Characters are stored with their probabilities

1048698 Number of bits of the coded characters differs Shortest

code is assigned to most frequently occurring character

bull 1048698 To determine Huffman code we construct a binary tree

bull 1048698 Leaves are characters to be encoded

bull 1048698 Nodes contain occurrence probabilities of the characters

bull belonging to the subtree

bull 1048698 0 and 1 are assigned to the branches of the tree arbitrarily -

bull therefore different Huffman codes are possible for the same

bull data

bull 1048698 Huffman table is generated

1048698 Huffman tables must be transmitted with compressed

data

18

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 19: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

EXAMPLE OF

HUFFMAN CODING

19

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 20: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

PROPERTIES OF

HUFFMAN CODING

1048698 Unique prefix property No Huffman code is a

prefix of any other Huffman code ndash precludes any

ambiguity in decoding

1048698 Optimality Minimum redundancy code ndash proved

optimal for a given data model (ie a given

accurate probability distribution)

1048698 The two least frequent symbols will have the same length

for their Huffman codes differing only at the last bit

1048698 Symbols that occur more frequent will have shorter

Huffman codes than symbols that occur less frequent

1048698 The average code length for an information source S is

strictly less than

20

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 21: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

ARITHMETIC CODING

Each symbol is coded by considering prior

data

1048698 encoded sequence must be read from beginning

no random access possible

1048698 Each symbol is a portion of a real number

between 0 and 1

1048698 When the message becomes longer the length of

the interval shortens and the number of bits

needed to represent the interval increases

21

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 22: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

ARITHMETIC VS

HUFFMAN

1048698 Arithmetic encoding does not encode each symbol separately

Huffman encoding does

1048698 Arithmetic encoding transmits only length of encoded string

Huffman encoding transmits the Huffman table

1048698 Compression ratios of both are similar

22

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 23: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

23

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 24: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

ARITHMETIC CODING

ENCODER

24

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 25: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

Example

Encode Symbols ldquoCAEE$rdquo

25

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 26: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

26

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 27: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

27

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 28: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

1048698 The final step in Arithmetic encoding calls for the generation

of a number that falls within the rang [low high)

1048698 The above algorithm will ensure that the shortest binary

codeword is found

28

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 29: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

29

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 30: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

ARITHMETIC CODING

DECODER

30

1048698 Decoding symbols ldquoCAEE$rdquo

31

Page 31: Lecture VI: JPEG IMAGE COMPRESSION · LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1. STORAGE SPACE Uncompressed graphics, audio, and video data require substantial

1048698 Decoding symbols ldquoCAEE$rdquo

31