data compression and huffman algorithm

18
DATA COMPRESSION AND HUFFMAN ALGORITHM Technical Seminar Paper Submitted by Presented by Vineet Agarwala NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY IT200118155 Technical Seminar Under the guidance of Anisur Rahman

Upload: lipika008

Post on 16-Nov-2014

1.618 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Compression and Huffman Algorithm

DATA COMPRESSION AND HUFFMAN ALGORITHM

Technical Seminar Paper Submitted by

Presented by

Vineet Agarwala

NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY

IT200118155

Technical Seminar Under the guidance ofAnisur Rahman

Page 2: Data Compression and Huffman Algorithm

DATA COMPRESSIONVirtually all forms of data - text, numerical, image, video contain redundant elementsData can be compressed by eliminating the redundant elements.A code is substituted for the eliminated redundant element, where the code is shorter than eliminated element.When compressed data is retrieved from storage or received over a communications link, it is expanded back to its original form, based on the code.Compression is used:

to save storage spaceto reduce communications transmission requirements

The art or science of compactly representing informationDigital realm: using lesser number of bits to represent informationData + Compression = information – redundancy

Page 3: Data Compression and Huffman Algorithm

REDUNDANCYMost types of computer files are fairly redundant -- they have the same

information listed over and over again. File-compression programs

simply get rid of the redundancy

“Ask not what your country can do for you -- ask what

you can do for your country.”

Ignoring the difference between capital and lower-case

letters, roughly half of the phrase is redundant. Nine words

-- ask, not, what, your, country, can, do, for, you -- give us

almost everything we need for the entire quote

Page 4: Data Compression and Huffman Algorithm

Compression TechniquesLossless

Data can be completely recovered after decompression

Recovered data is identical to original

Exploits redundancy in data

LossyData cannot be completely recovered after decompression

Some information is lost for ever

Gives more compression than lossless

Discards “insignificant” data components

Page 5: Data Compression and Huffman Algorithm

IMAGE COMPRESSION Image compression can be lossy or lossless Methods for lossless image compression are:

Run-length encoding Entropy coding Adaptive dictionary algorithms such as LZW

Methods for lossy compression are: Reducing the color space to the most common colors in the image.

The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with dithering to blur the color borders.

Transform coding. This is the most commonly used method. A Fourier-related transform such as DCT or the wavelet transform are applied, followed by quantization and entropy coding.

Fractal compression.

Page 6: Data Compression and Huffman Algorithm

JPEG (TRANSFORM COMPRESSION) JPEG is named after its origin, the Joint Photographers Experts

Group

This involves reducing the number of bits per sample or entirely

discard some of the samples

Page 7: Data Compression and Huffman Algorithm

MULTIMEDIA COMPRESSION Multimedia compression is a general term referring to the

compression of any type of multimedia, most notably

graphics, audio, and video

MPEG (Moving Pictures Experts Group ) The future of this

technology is to encode the compression and

uncompression algorithms directly into integrated circuits.

The approach used by MPEG can be divided into two types

of compression: within-the-frame and between-frame

Page 8: Data Compression and Huffman Algorithm

DATA COMPRESSION ALGORITHMS

LOSSY COMPRESSION

Run Length Encoding

Huffman Coding

Delta

LZW

LOSS LESS COMPRESSION

CS & Q

JPEG

MPEG

Page 9: Data Compression and Huffman Algorithm

RUN-LENGTH ENCODING

Example of run-length encoding. Each run of zeros is replaced by two characters in the compressed file: a zero to indicate that compression is occurring, followed by the

number of zeros in the run.

Data files frequently contain the same character repeated many times in a row.

Page 10: Data Compression and Huffman Algorithm

HUFFMAN ENCODING This method is named after D.A. Huffman, who

developed the procedure in the 1950s.

More than 96% of this file consists of only 31

characters out of 127

Page 11: Data Compression and Huffman Algorithm

HUFFMAN ENCODING EXAMPLE

Character frequenciesA: 20% (.20)B: 9% (.09)C: 15%D: 11%E: 40%F: 5%

C .15

A.20

D.15

F.05

BF.14

B.09

0 1

E.4

Page 12: Data Compression and Huffman Algorithm

HUFFMAN ENCODING EXAMPLE (CONDT.)

CodesA: 010B: 0000C: 011D: 001E: 1F: 0001

ABCDEF1.0

E.4

C .15

A.20

D.15

F.05

BF.14

AC.35

BFD.25

ABCDF.6

B.09

0

0

0

0

0

1

1

11

1

Page 13: Data Compression and Huffman Algorithm

Run Length EncodingRun Length Encoding

CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21 symbols

Page 14: Data Compression and Huffman Algorithm

Run Length Encoding (Contd.)Run Length Encoding (Contd.)

WWWBWWWWWBWWWBWWWWBWWWWWBWWWBWWWWWBWWBWWWWWWBBBWWWWWWWBWBWWWWWWWBWWBBWWWWWBWWWWBWWWWBWWWWB

WWWBWWWWWBWWWBWWWWB….

3WB5WB3WB4WB….

3151314 possible optimization, but…

#W3151314….. Optimization requires escape character

Page 15: Data Compression and Huffman Algorithm

Run Length Encoding (Contd.)Run Length Encoding (Contd.)Is run length encoding practical for images?

No

Yes

Chances of three or more identical consecutive pixels are low for most real images.Especially images with large color depth.

Some images do have lots of consecutive pixels. Especially images with low color depth.RLE is used for fax machines, and by BMP, TIFF and PCX files.

Page 16: Data Compression and Huffman Algorithm

LZW Compression

LZW compression is named after its

developers, A. Lempel and J. Ziv, with later

modifications by Terry A. Welch. It is the

foremost technique for general purpose data

compression due to its simplicity and

versatility

Page 17: Data Compression and Huffman Algorithm

LZW Compression (contd.)

LZW compression flowchart.

The variable, CHAR, is a single byte. The variable, STRING, is a variable length sequence of bytes. Data are read from the input file (box 1 & 2) as single bytes, and written to the compressed file (box 4) as 12 bit codes.

Page 18: Data Compression and Huffman Algorithm

CONCLUSION

Is it possible to create a data compression Is it possible to create a data compression

algorithm that will always compress data?algorithm that will always compress data?

Is there an optimal data compression algorithm?Is there an optimal data compression algorithm?Lossless: No, compression rates depend on the data.Lossy: No, the quality of compression is subjective.

Is Data Compression is really that important?Is Data Compression is really that important?