www.fakengineer.com data compression. introduction if you download many programs and files off the...

20
www.fakengineer.com Data compression

Upload: sierra-mcallister

Post on 26-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Data compression

Page 2: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

INTRODUCTION

• If you download many programs and files off the Internet, we have probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users, because it reduces the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk.

• The technique behind these ZIP files is known as “Data compression".

Page 3: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

How data compression works? It is based on the following processes:• Finding Redundancy Let us take an example:

• In John F. Kennedy's 1961 inaugural address, he delivered this famous line:

" Ask not what your country can do for you --ask what you can do for your country. "

• The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If each letter, space or punctuation mark takes up one unit of memory, we get a total file size of 79 units.

Page 4: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

How data compression works?

• To get the file size down, we need to look for redundancies.

• In the above quote the words ask, what, your, country, can, do, for, you appear two times.

• That means nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote.

• To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.

Page 5: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Looking it Up

• In this step we pick out the words that are repeated and put them into the numbered index.

Our sentence now reads:

• "1 not 2 3 4 5 6 7 8 --1 2 8 5 6 7 3 4"

WordsNumbered

Index

ask 1

what 2

your 3

country 4

can 5

do 6

for 7

you 8

Page 6: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Searching for Patterns

• The phrase "can do for" is repeated, one time followed by "your" and one time followed by "you," giving us a repeated pattern of "can do for you."

• This lets us write 15 characters (including spaces), while "your country" only lets us write 13 characters (with spaces), so the program would overwrite the "your country" entry as just r country," and then write a separate entry for "can do for you."

Page 7: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Searching for Patterns

• Using the patterns we picked out above, and adding “-" for spaces, we come up with this larger dictionary:

And the quote converted to this smaller sentence:

Words with space

Index Number

ask___ 1

what___ 2

you 3

r_country 4

_can_do_for_you

5

"1 not ___ 2345 ___ -- __12354“

Page 8: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Data compression methods

Data compression

LosslessMethods

LossyMethods

Page 9: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Lossless compression

The following are some of the techniques used in lossless compression.

• Run- Length Encoding: When data containing strings of repeated symbols

(such as bits or characters), the strings can be replaced by a special marker, followed by the repeated symbol, followed by the number of occurrences.

Page 10: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Run- Length Encoding

5726444444444432133333333333333333333127800000000000000000

Figure2(a): Original data

5726#409321#3191278#015

The symbol 4 is repeated 09 times.

Th

e sy

mb

ol 3

is

rep

eate

d 1

9 ti

mes

.

The symbol 0 is

repeated 15 times.

Figure2(b): Compressed data

The symbol # is the marker

Page 11: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Statistical Compression

The three common encoding system using this principle are Morse code, Huffman encoding and Lempel-Ziv-Welch encoding.

• Morse code: It uses variable length combination of mark (dash)

space (dot) to encode data. One-symbol code represents the most frequent characters and five-symbol codes represent the least frequent characters.

Example dot (.) represents the character E and four dashes and a dot ( --.--) represent the character Q.

Page 12: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Huffman Encoding E:0 T:1A:00 I:01 M:10 N:11C:000 D:001 G:010 K:011 O:100 R:101 S:110 U:111

Figure3: Bit assignment based on frequency of characters

00101010011110

Code sent

0 01 010 100 1 111 0 E I G O T U E

First interpretation

00 10 101 0 01 1 110 A M R E I T S

001 010 100 111 10 D G O U M

Second interpretation Third interpretation

Figure4: Multiple interpretations of transmitted data

Page 13: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Lempel-Ziv-Welch Encoding

• The LZW method of compressing data is an evolution of the method originally created by Abraham Lempel and Jacob Ziv

• The compression which takes place at the sender site, has the following components: a dictionary, a buffer, and an algorithm.

Page 14: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Lempel-Ziv-Welch

1 2 3

A B C

Figure 5: Original dictionary for a three symbol text.

Buffer

Dictionary

BufferStrings to dictionary Symbols

from the text

Codes sent

Figure 6: Buffer at the compression site

Page 15: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Compression algorithm

Figure 7 shows the flow chart for the compression algorithm.

Page 16: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Decompression algorithm

• The decompression process, which takes place at the receiver site, uses the same components at the compression process.

• Dictionary A very interesting point is that the sender does not send dictionary created by the

compression process; instead, the dictionary will be created at receiver site and, surprisingly, it is the exact replica of the dictionary created at the sender site.

• Buffers

Buffer Temporary buffer String to

dictionary

Symbols to be printed

Figure8: Buffers at the decompression site

Codes received

Page 17: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Decompression

algorithm

Figure9 shows the flowchart for the decompression algorithm.

Page 18: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Lossy compression

• If the decompressed data is not an exact replica of original information but something very close, we can use a lossy data compression method.

• Several methods have been developed using lossy compression techniques. Joint photographic experts group (JPEG) is used to compress pictures and graphics. Motion picture experts group (MPEG) is used to compress video.

Page 19: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

Conclusion

• With technologies developing at a rapid rate new data compression methods are arising. One of them is JBIG (Joint Hi- Level Image Experts Group). It is made for image compressions and is a lossless method. Using artificial neural network the data compression techniques are also developing.

Page 20: Www.fakengineer.com Data compression.  INTRODUCTION If you download many programs and files off the Internet, we have probably encountered

www.fakengineer.com

THANK YOU !!