presentation_lempelziv ese751 (dr ti

55
LEMPEL-ZIV COMPRESSION GROUP MEMBERS: SUHANA BINTI SABUDIN HARYANTI BINTI NORHAZMAN NURULAZLINA BINTI RAMLI FARHAN HANI BINTI GHAZALI ESE 751- SPEECH , IMAG E AND CODING

Upload: suhana-sabudin

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 1/55

LEMPEL-ZIV

COMPRESSION

GROUP MEMBERS:

�SUHANA BINTI SABUDIN

�HARYANTI BINTI NORHAZMAN

�NURULAZLINA BINTI RAMLI

�FARHAN HANI BINTI GHAZALI

ESE 751-SPEECH, IMAGE AND

CODING

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 2/55

INTRODUCTION

� An algorithm for lossless data compression scheme,originally called Lempel-Ziv coding  , and also referred to asLempel-Ziv-Welch (LZW) coding , following themodifications of Welch.

� Not a single algorithm. A family of algorithm developed byAbraham Lempel and Jacob Ziv

e.g. LZW (Lempel-Ziv-Welch): used in the compresscommand Unix operating system. TIFF (Tag Image File Format) supports LZ coding

� Adopted in a variety of imaging file formats, such as thegr aphi c interchange f or mat (GIF) , tagged  image f ile f or mat  (TIFF) and  the  por table document f or mat (PDF).

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 3/55

BASIC PRINCIPLES OF ENCODING:

1) It assigns a fixed length codeword to a variable length of 

symbols.

2) Unlike Huff man coding and  ar ithmeti c coding , this coding

scheme does not require a  pr ior i k nowled ge o f the

probabilities of the source symbols.

3) The coding is based on a dictionary or codebook

containing the source symbols to be encoded. The coding

starts with an initial dictionary, which is enlarged with the

arrival of new symbol sequences.

4) There is no need to transmit the dictionary from the

encoder to the decoder. A Lempel-Ziv decoder builds an

identical dictionary during the decoding process.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 4/55

LEMPEL AND ZIV INTRODUCED DYNAMIC

DICTIONARY ENCODERS KNOWN AS:

� LZ77 :  An adaptive dictionary-based compression algorithm

and developed in 1977. An algorithm uses a sliding window

dictionary, where each entry is a character. LZ77 code words

consist of an offset to a sliding window and the number of 

characters following the offset to include in an encoded

string.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 5/55

LEMPEL AND ZIV INTRODUCED DYNAMIC

DICTIONARY ENCODERS KNOWN AS:

� LZ78 : Due to inefficiency, Lempel and Ziv developed a

different form of dictionary-based compression in 1978

The techniques is used by replacing the phrases with a pointer

to where they have occurred earlier in the text.

� LZW : If the message to be encoded consists of only one

character, LZW outputs the code for this character

otherwise it inserts two- or multi-character,

overlapping*,distinct patterns of the message to be encoded

in a Dictionary.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 6/55

� First paper by Lempel and Ziv in 1977 about losslesscompression with an adaptive dictionary.

�LZ 77 uses previous seen text to build a dictionary

Strings of symbols are added to a dictionary.

Adaptive dictionary: Entries are taken from the textitself and created on-the-fly�A search buffer containing encoded charactersequence that precedes the current coding position can

be considered as a dictionary�The encoder matches the input sequence through asliding window.

1) LZ77 ALGORITHM

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 7/55

LZ77 CONT..

� Main data structure is a sliding window divided into twoparts:

A look-ahead buffer which has characters read in from theinput but not yet encoded.

A large block of decoded text held in a search buffer

� Symbols within the look-ahead buffer are then compared

with data in the search buffer.

� The algorithm tries to match the contents of the look-aheadbuffer to a string in the search buffer.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 8/55

LEMPEL AND ZIV INTRODUCED DYNAMIC

DICTIONARY ENCODERS KNOWN AS:

� LZ77 : adaptive dictionary-based compression algorithm

and developed in 1977

�LZ78 : Due to inefficiency, Lempel and Ziv developed a

different form of dictionary-based compression in 1978

The techniques is used by replacing the phrases with a

pointer to where they have occurred earlier in the text.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 9/55

LZ77- ENCODING

� To encode the sequence in the look-ahead buffer ,the search buffer is searched to find the longestmatch with a prefix of the look-ahead buffer .

� Once the longest match is found, it is coded into a

 fixed-length codeword consisting of three elements:(   position; length; the char acter f ollowing the pr e f ix in a look-ahead buff er).

� The match can overlap with the look-ahead buff er ,

but it cannot overlap the buffer itself.� The window is shifted left by length+1 symbols to

begin the next search.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 10/55

LZ77- SLIDING WINDOW

� The LZ77 algorithm employs a principle called sliding-window:

� It looks at the data through a search buffer , anything outside

this window can neither be referenced nor encoded.

� As more data is being encoded, the window slides along,

removing the oldest encoded data from the view and addingnew unencoded data to it.

� This is where we spotted the weakness of the outlined

algorithm. What happens if the input is very long and

therefore references (and lengths) become very largenumbers?

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 11/55

EXAMPLE OF LZ77

(ENCODER PART)

Input string= abracadabradSteps:

1) Read an unencoded string (at look ahead buffer)

2) Search the longest matching of the current look

ahead buffer in the search buffer. If a match is

found, write the encode output (fixed-length

codeword) by following this concept:

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 12/55

Concept of encoded:

encode = < x , y , z >

where,

x= is the no of match prefix location that we

found in the search buffer

y= length of the match prefix

z= next bit after a match prefix in the look ahead

buffer

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 13/55

Reminder!!!

If there have the match prefix in the search buffer, therefore at next sliding

window, we will put in that next prefix together to the current search buffer .

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 14/55

3) Go through the look ahead buffer until finish

the encoded process.

4)Finally, we get the string that been encodedby this compression algorithm.

Unencoded string;

S=abracadabrad

Encoded output==

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 15/55

LZ77 : DECODING PROCESS

� The LZSS decoding process is less resource

intensive than the LZSS encoding process. The

encoding process requires that the dictionary

is searched for matches to the string to be

encoding

� Decoding an offset and length combination

only requires going to a dictionary offset andcopying the specified number of symbols. No

searching is required.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 16/55

 

THE FOLLOWING STEPS:

Step 1. Initialize the dictionary to a known value.

Step 2. Read the encoded/not encoded flag.

Step 3. If the flag indicates an encoded string:Step 3a. Read the encoded length and offset,

then copy the specified number of symbols from

the dictionary to the decoded output.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 17/55

Step 3b. Otherwise, read the next character and

write it to the decoded output.

Step 4. Shift a copy of the symbols written tothe decoded output into the dictionary.

Step 5. Repeat from Step 2, until all the entire

input has been decoded.

DECODING INPUT REQUIRES THE

FOLLOWING STEPS:

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 18/55

EXAMPLE OF LZ77

(DECODER PART)To get the original input string, we need to do the

decompress process.

Steps:

1)Decode the encoded output that we get from

the compression process.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 19/55

2) Construct the table as below.

Encoded

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 20/55

3) Decode the encoded input level by level with

follow the decoded concept.

< x , y , z >Next bit or character 

after match prefix

Length of the match

prefix

Contain No. of 

match prefix that we

found previously

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 21/55

4) Repeat step 3 until finish all the encoded

input.

3) Finally, we get the decoded output. It is 

represent the original input string.

Encoded input ==

The string

(Decoded output) == S= abracadabrad

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 22/55

2) LZ78 ALGORITHM

� Due to inefficiency of LZ77, Lempel and Zivdeveloped a different form of dictionary-based compression in 1978 called LZ78.

� Instead of having a limited-size window intothe preceding text, LZ78 builds its dictionaryout of all of the previously seen symbols in theinput text.

� The basic idea of this method is to build adictionary of strings while encoding.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 23/55

LZ78 CONT..

� Both the encoder and decoder start off with

an empty dictionary. As each character is read

in, it is added to the current string.

� The dictionary is built progressively, one

character at a time. As long as the character

matches some existing phrase in the

dictionary, this process continues.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 24/55

LET]S BUILD

THE DICTIONARY!!!

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 25/55

EXAMPLE OF LZ78

(ENCODER PART)

The string S= bacababbaabbabbaaacbbc is to

be encoded. Show the encoding process.

STEPS:

1) Initially the dictionary is empty. Go through

the given string, a bit by bit to encountered a

Unix input symbol(no match with others) or

 phrases and then added to the dictionary.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 26/55

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 27/55

2)Encode the phrases by defining whether

the phrase has any match with existing

 phrases in dictionary.

Concept of encoded:

encode = < x, y >

where,

x = find the match phrase in the dictionary and

encode it by refer to the index no

y = the last bit in each phrase (Unix symbol)

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 28/55

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 29/55

3) Repeat step no. 2 until finish the string.

4)Finally, we get the string that been encoded by

this compression algorithm.

Unencoded string;

S= bacababbaabbabbaaacbbc

Encoded output==

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 30/55

EXAMPLE OF LZ78

(DECODER PART)

To get the original input string, we need to dothe decompress process.

Steps:

1)Decode the encoded output that we get from

the compression process.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 31/55

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 32/55

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 33/55

2) Repeat step 1 until finish the encoded

output.

3) Finally, we get the decoded output. It is 

represent the original input string.

Encoded input ==

The string (decoded output)S= bacababbaabbabbaaacbbc

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 34/55

3) LZW : LEMPEL-ZIV-WELCH

ALGORITHM

� LempelZivWelch (LZW) is a universal lossless

data compression algorithm created by Abraham

Lempel, Jacob Ziv, and Terry Welch.

� Published by Welch in 1984 as an improved

implementation of the LZ78 algorithm.

� The algorithm is designed to be fast to implement

but is not usually optimal because it performs onlylimited analysis of the data.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 35/55

HOW LZW WORKS?

� The codes from 0 to 255 represent 1-character

sequences consisting of the corresponding 8-

bit character (ASCII Codes)

� The remaining codes (256 through 4095) are

assigned to strings as the algorithm proceeds.

The example runs as shown with 12 bit codes.

� This means codes 0-255 refer to individual

bytes, while codes 256-4095 refer to

substrings.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 36/55

LZW - CONT

� Produces only a list of dictionary entry indexes

� Encoding

1. Starts with initial dictionary

� For example, possible ascii characters (0..255)

2. From the input, find the longest string that exists

in the dictionary

3. Output this strings index in the dictionary4. Append the next character in the input to that

string and add it into the dictionary

5. Continue from that character on from (2)

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 37/55

EXAMPLE : COMPRESSION USING LZW

Example : Use the LZW algorithm to compress the string

BABAABAAA

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 38/55

EXAMPLE : LZW COMPRESSION STEP 1

BABAABAAA

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 39/55

EXAMPLE : LZW COMPRESSION STEP 2

BABAABAAA

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

AB257A65

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 40/55

EXAMPLE : LZW COMPRESSION STEP 3

BABAABAAA

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

AB257A65

BAA258BA256

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 41/55

EXAMPLE 1: LZW COMPRESSION STEP 4

BABAABAAA

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

AB257A65

BAA258BA256

ABA259AB257

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 42/55

EXAMPLE 1: LZW COMPRESSION STEP 5

BABAABAAA

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

AB257A65

BAA258BA256

ABA259AB257

AA260A65

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 43/55

EXAMPLE : LZW COMPRESSION STEP 6

BABAABAAA P=AA

C=empty

STRING  TABLEENCODER   OUTPUT

stringcodewordrepresentingoutput code

BA256B66

AB257A65

BAA258BA256

ABA259AB257

AA260A65

AA260

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 44/55

LZW DECOMPRESSION

The LZW de-compressor creates the same stringtable during decompression.

It starts with the first 256 table entries initialized tosingle characters.

The string table is updated for each character in the

input stream, except the first one.

Decoding achieved by reading codes and translatingthem through the code table being built.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 45/55

EXAMPLE : LZW DECOMPRESSION 1

Example 2: Use LZW to decompress the output sequence of 

Example 1:

<66><65><256><257><65><260>. 

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 46/55

EXAMPLE : LZW DECOMPRESSION STEP 1

<66><65><256><257><65><260>  Old = 65  S =ANew = 66  C =A

STRING TABLEENCODER  OUTPUT

stringcodewordstring

B

BA256A

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 47/55

EXAMPLE : LZW DECOMPRESSION STEP 2

<66><65><256><257><65><260>

STRING TABLEENCODER  OUTPUT

stringcodewordstring

B

BA256A

AB257BA

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 48/55

EXAMPLE : LZW DECOMPRESSION STEP 3

<66><65><256><257><65><260>

STRING TABLEENCODER  OUTPUT

stringcodewordstring

B

BA256A

AB257BA

BAA258AB

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 49/55

EXAMPLE : LZW DECOMPRESSION STEP 4

<66><65><256><257><65><260>

STRING TABLEENCODER  OUTPUT

stringcodewordstring

B

BA256A

AB257BA

BAA258AB

ABA259A

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 50/55

EXAMPLE : LZW DECOMPRESSION STEP 5

<66><65><256><257><65><260>

STRING TABLEENCODER  OUTPUT

stringcodewordstring

B

BA256A

AB257BA

BAA258AB

ABA259A

AA260AA

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 51/55

 

DISADVANTAGES OF LZW

� LZW compression works best for files

containing lots of repetitive data. This is often

the case with text and monochrome images.

� Files that are compressed but that do notcontain any repetitive information at all can

even grow bigger!

� LZW compression is fast.� LZW is a fairly old compression technique - All

recent computer systems have the

horsepower to use more efficient algorithms.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 52/55

 

USED?

� LZW compression can be used in a variety of 

file formats:

� TIFF files

� GIF files

� PDF files In recent applications LZW has

been replaced by the more efficient Flate

algorithm.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 53/55

CONCLUSION

o The Lempel Ziv algorithms belong to the third

category of dictionary coders.

o The dictionary is being built in a single pass,

while at the same time also encoding the

data.

o It is not necessary to explicitly transmit/store

the dictionary because the decoder can build

up the dictionary in the same way as the

encoder while decompressing the data.

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 54/55

8/6/2019 Presentation_lempelziv Ese751 (Dr ti

http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 55/55

THE END

-THANK YOU-