presentation_lempelziv ese751 (dr ti
TRANSCRIPT
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 1/55
LEMPEL-ZIV
COMPRESSION
GROUP MEMBERS:
�SUHANA BINTI SABUDIN
�HARYANTI BINTI NORHAZMAN
�NURULAZLINA BINTI RAMLI
�FARHAN HANI BINTI GHAZALI
ESE 751-SPEECH, IMAGE AND
CODING
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 2/55
INTRODUCTION
� An algorithm for lossless data compression scheme,originally called Lempel-Ziv coding , and also referred to asLempel-Ziv-Welch (LZW) coding , following themodifications of Welch.
� Not a single algorithm. A family of algorithm developed byAbraham Lempel and Jacob Ziv
e.g. LZW (Lempel-Ziv-Welch): used in the compresscommand Unix operating system. TIFF (Tag Image File Format) supports LZ coding
� Adopted in a variety of imaging file formats, such as thegr aphi c interchange f or mat (GIF) , tagged image f ile f or mat (TIFF) and the por table document f or mat (PDF).
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 3/55
BASIC PRINCIPLES OF ENCODING:
1) It assigns a fixed length codeword to a variable length of
symbols.
2) Unlike Huff man coding and ar ithmeti c coding , this coding
scheme does not require a pr ior i k nowled ge o f the
probabilities of the source symbols.
3) The coding is based on a dictionary or codebook
containing the source symbols to be encoded. The coding
starts with an initial dictionary, which is enlarged with the
arrival of new symbol sequences.
4) There is no need to transmit the dictionary from the
encoder to the decoder. A Lempel-Ziv decoder builds an
identical dictionary during the decoding process.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 4/55
LEMPEL AND ZIV INTRODUCED DYNAMIC
DICTIONARY ENCODERS KNOWN AS:
� LZ77 : An adaptive dictionary-based compression algorithm
and developed in 1977. An algorithm uses a sliding window
dictionary, where each entry is a character. LZ77 code words
consist of an offset to a sliding window and the number of
characters following the offset to include in an encoded
string.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 5/55
LEMPEL AND ZIV INTRODUCED DYNAMIC
DICTIONARY ENCODERS KNOWN AS:
� LZ78 : Due to inefficiency, Lempel and Ziv developed a
different form of dictionary-based compression in 1978
The techniques is used by replacing the phrases with a pointer
to where they have occurred earlier in the text.
� LZW : If the message to be encoded consists of only one
character, LZW outputs the code for this character
otherwise it inserts two- or multi-character,
overlapping*,distinct patterns of the message to be encoded
in a Dictionary.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 6/55
� First paper by Lempel and Ziv in 1977 about losslesscompression with an adaptive dictionary.
�LZ 77 uses previous seen text to build a dictionary
Strings of symbols are added to a dictionary.
Adaptive dictionary: Entries are taken from the textitself and created on-the-fly�A search buffer containing encoded charactersequence that precedes the current coding position can
be considered as a dictionary�The encoder matches the input sequence through asliding window.
1) LZ77 ALGORITHM
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 7/55
LZ77 CONT..
� Main data structure is a sliding window divided into twoparts:
A look-ahead buffer which has characters read in from theinput but not yet encoded.
A large block of decoded text held in a search buffer
� Symbols within the look-ahead buffer are then compared
with data in the search buffer.
� The algorithm tries to match the contents of the look-aheadbuffer to a string in the search buffer.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 8/55
LEMPEL AND ZIV INTRODUCED DYNAMIC
DICTIONARY ENCODERS KNOWN AS:
� LZ77 : adaptive dictionary-based compression algorithm
and developed in 1977
�LZ78 : Due to inefficiency, Lempel and Ziv developed a
different form of dictionary-based compression in 1978
The techniques is used by replacing the phrases with a
pointer to where they have occurred earlier in the text.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 9/55
LZ77- ENCODING
� To encode the sequence in the look-ahead buffer ,the search buffer is searched to find the longestmatch with a prefix of the look-ahead buffer .
� Once the longest match is found, it is coded into a
fixed-length codeword consisting of three elements:( position; length; the char acter f ollowing the pr e f ix in a look-ahead buff er).
� The match can overlap with the look-ahead buff er ,
but it cannot overlap the buffer itself.� The window is shifted left by length+1 symbols to
begin the next search.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 10/55
LZ77- SLIDING WINDOW
� The LZ77 algorithm employs a principle called sliding-window:
� It looks at the data through a search buffer , anything outside
this window can neither be referenced nor encoded.
� As more data is being encoded, the window slides along,
removing the oldest encoded data from the view and addingnew unencoded data to it.
� This is where we spotted the weakness of the outlined
algorithm. What happens if the input is very long and
therefore references (and lengths) become very largenumbers?
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 11/55
EXAMPLE OF LZ77
(ENCODER PART)
Input string= abracadabradSteps:
1) Read an unencoded string (at look ahead buffer)
2) Search the longest matching of the current look
ahead buffer in the search buffer. If a match is
found, write the encode output (fixed-length
codeword) by following this concept:
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 12/55
Concept of encoded:
encode = < x , y , z >
where,
x= is the no of match prefix location that we
found in the search buffer
y= length of the match prefix
z= next bit after a match prefix in the look ahead
buffer
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 13/55
Reminder!!!
If there have the match prefix in the search buffer, therefore at next sliding
window, we will put in that next prefix together to the current search buffer .
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 14/55
3) Go through the look ahead buffer until finish
the encoded process.
4)Finally, we get the string that been encodedby this compression algorithm.
Unencoded string;
S=abracadabrad
Encoded output==
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 15/55
LZ77 : DECODING PROCESS
� The LZSS decoding process is less resource
intensive than the LZSS encoding process. The
encoding process requires that the dictionary
is searched for matches to the string to be
encoding
� Decoding an offset and length combination
only requires going to a dictionary offset andcopying the specified number of symbols. No
searching is required.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 16/55
THE FOLLOWING STEPS:
Step 1. Initialize the dictionary to a known value.
Step 2. Read the encoded/not encoded flag.
Step 3. If the flag indicates an encoded string:Step 3a. Read the encoded length and offset,
then copy the specified number of symbols from
the dictionary to the decoded output.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 17/55
Step 3b. Otherwise, read the next character and
write it to the decoded output.
Step 4. Shift a copy of the symbols written tothe decoded output into the dictionary.
Step 5. Repeat from Step 2, until all the entire
input has been decoded.
DECODING INPUT REQUIRES THE
FOLLOWING STEPS:
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 18/55
EXAMPLE OF LZ77
(DECODER PART)To get the original input string, we need to do the
decompress process.
Steps:
1)Decode the encoded output that we get from
the compression process.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 19/55
2) Construct the table as below.
Encoded
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 20/55
3) Decode the encoded input level by level with
follow the decoded concept.
< x , y , z >Next bit or character
after match prefix
Length of the match
prefix
Contain No. of
match prefix that we
found previously
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 21/55
4) Repeat step 3 until finish all the encoded
input.
3) Finally, we get the decoded output. It is
represent the original input string.
Encoded input ==
The string
(Decoded output) == S= abracadabrad
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 22/55
2) LZ78 ALGORITHM
� Due to inefficiency of LZ77, Lempel and Zivdeveloped a different form of dictionary-based compression in 1978 called LZ78.
� Instead of having a limited-size window intothe preceding text, LZ78 builds its dictionaryout of all of the previously seen symbols in theinput text.
� The basic idea of this method is to build adictionary of strings while encoding.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 23/55
LZ78 CONT..
� Both the encoder and decoder start off with
an empty dictionary. As each character is read
in, it is added to the current string.
� The dictionary is built progressively, one
character at a time. As long as the character
matches some existing phrase in the
dictionary, this process continues.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 24/55
LET]S BUILD
THE DICTIONARY!!!
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 25/55
EXAMPLE OF LZ78
(ENCODER PART)
The string S= bacababbaabbabbaaacbbc is to
be encoded. Show the encoding process.
STEPS:
1) Initially the dictionary is empty. Go through
the given string, a bit by bit to encountered a
Unix input symbol(no match with others) or
phrases and then added to the dictionary.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 26/55
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 27/55
2)Encode the phrases by defining whether
the phrase has any match with existing
phrases in dictionary.
Concept of encoded:
encode = < x, y >
where,
x = find the match phrase in the dictionary and
encode it by refer to the index no
y = the last bit in each phrase (Unix symbol)
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 28/55
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 29/55
3) Repeat step no. 2 until finish the string.
4)Finally, we get the string that been encoded by
this compression algorithm.
Unencoded string;
S= bacababbaabbabbaaacbbc
Encoded output==
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 30/55
EXAMPLE OF LZ78
(DECODER PART)
To get the original input string, we need to dothe decompress process.
Steps:
1)Decode the encoded output that we get from
the compression process.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 31/55
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 32/55
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 33/55
2) Repeat step 1 until finish the encoded
output.
3) Finally, we get the decoded output. It is
represent the original input string.
Encoded input ==
The string (decoded output)S= bacababbaabbabbaaacbbc
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 34/55
3) LZW : LEMPEL-ZIV-WELCH
ALGORITHM
� LempelZivWelch (LZW) is a universal lossless
data compression algorithm created by Abraham
Lempel, Jacob Ziv, and Terry Welch.
� Published by Welch in 1984 as an improved
implementation of the LZ78 algorithm.
� The algorithm is designed to be fast to implement
but is not usually optimal because it performs onlylimited analysis of the data.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 35/55
HOW LZW WORKS?
� The codes from 0 to 255 represent 1-character
sequences consisting of the corresponding 8-
bit character (ASCII Codes)
� The remaining codes (256 through 4095) are
assigned to strings as the algorithm proceeds.
The example runs as shown with 12 bit codes.
� This means codes 0-255 refer to individual
bytes, while codes 256-4095 refer to
substrings.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 36/55
LZW - CONT
� Produces only a list of dictionary entry indexes
� Encoding
1. Starts with initial dictionary
� For example, possible ascii characters (0..255)
2. From the input, find the longest string that exists
in the dictionary
3. Output this strings index in the dictionary4. Append the next character in the input to that
string and add it into the dictionary
5. Continue from that character on from (2)
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 37/55
EXAMPLE : COMPRESSION USING LZW
Example : Use the LZW algorithm to compress the string
BABAABAAA
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 38/55
EXAMPLE : LZW COMPRESSION STEP 1
BABAABAAA
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 39/55
EXAMPLE : LZW COMPRESSION STEP 2
BABAABAAA
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
AB257A65
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 40/55
EXAMPLE : LZW COMPRESSION STEP 3
BABAABAAA
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
AB257A65
BAA258BA256
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 41/55
EXAMPLE 1: LZW COMPRESSION STEP 4
BABAABAAA
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
AB257A65
BAA258BA256
ABA259AB257
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 42/55
EXAMPLE 1: LZW COMPRESSION STEP 5
BABAABAAA
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
AB257A65
BAA258BA256
ABA259AB257
AA260A65
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 43/55
EXAMPLE : LZW COMPRESSION STEP 6
BABAABAAA P=AA
C=empty
STRING TABLEENCODER OUTPUT
stringcodewordrepresentingoutput code
BA256B66
AB257A65
BAA258BA256
ABA259AB257
AA260A65
AA260
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 44/55
LZW DECOMPRESSION
The LZW de-compressor creates the same stringtable during decompression.
It starts with the first 256 table entries initialized tosingle characters.
The string table is updated for each character in the
input stream, except the first one.
Decoding achieved by reading codes and translatingthem through the code table being built.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 45/55
EXAMPLE : LZW DECOMPRESSION 1
Example 2: Use LZW to decompress the output sequence of
Example 1:
<66><65><256><257><65><260>.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 46/55
EXAMPLE : LZW DECOMPRESSION STEP 1
<66><65><256><257><65><260> Old = 65 S =ANew = 66 C =A
STRING TABLEENCODER OUTPUT
stringcodewordstring
B
BA256A
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 47/55
EXAMPLE : LZW DECOMPRESSION STEP 2
<66><65><256><257><65><260>
STRING TABLEENCODER OUTPUT
stringcodewordstring
B
BA256A
AB257BA
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 48/55
EXAMPLE : LZW DECOMPRESSION STEP 3
<66><65><256><257><65><260>
STRING TABLEENCODER OUTPUT
stringcodewordstring
B
BA256A
AB257BA
BAA258AB
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 49/55
EXAMPLE : LZW DECOMPRESSION STEP 4
<66><65><256><257><65><260>
STRING TABLEENCODER OUTPUT
stringcodewordstring
B
BA256A
AB257BA
BAA258AB
ABA259A
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 50/55
EXAMPLE : LZW DECOMPRESSION STEP 5
<66><65><256><257><65><260>
STRING TABLEENCODER OUTPUT
stringcodewordstring
B
BA256A
AB257BA
BAA258AB
ABA259A
AA260AA
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 51/55
DISADVANTAGES OF LZW
� LZW compression works best for files
containing lots of repetitive data. This is often
the case with text and monochrome images.
� Files that are compressed but that do notcontain any repetitive information at all can
even grow bigger!
� LZW compression is fast.� LZW is a fairly old compression technique - All
recent computer systems have the
horsepower to use more efficient algorithms.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 52/55
USED?
� LZW compression can be used in a variety of
file formats:
� TIFF files
� GIF files
� PDF files In recent applications LZW has
been replaced by the more efficient Flate
algorithm.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 53/55
CONCLUSION
o The Lempel Ziv algorithms belong to the third
category of dictionary coders.
o The dictionary is being built in a single pass,
while at the same time also encoding the
data.
o It is not necessary to explicitly transmit/store
the dictionary because the decoder can build
up the dictionary in the same way as the
encoder while decompressing the data.
8/6/2019 Presentation_lempelziv Ese751 (Dr ti
http://slidepdf.com/reader/full/presentationlempelziv-ese751-dr-ti 54/55