[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

Optimum String Match Choices in LZSSGraham Little� and James Diamond�

Jodrey School of Computer ScienceAcadia University

Wolfville, Nova Scotia, Canada B4P 2R6f081219l, jdiamondg@acadiau.ca

SummaryCompression techniques in the LZ77 family operate by repeatedly searching for strings ina dictionary and then outputting a series of tokens which unambiguously define the chosensequence of strings. The dictionary is composed of the most-recently matched N symbols,for some implementation-dependent N . The strings to be matched are the prefixes of theremaining input symbols. When a particular prefix has been matched, those symbols aremoved from the beginning of the remaining symbols to the end of the dictionary; in generalthis will cause some symbols to be deleted from the beginning of the dictionary, in order tolimit its size to N .

Compression algorithms in the LZ77 family perform a greedy choice when lookingfor the next string of input symbols to match. That is, the longest string of symbols which isfound in the current dictionary is chosen as the next match. Many variations of LZ77 havebeen proposed; some of these attempt to improve compression by sometimes choosing anon-maximal string, if it appears that such a choice might improve the overall compressionratio. In this paper we present an algorithm which computes a set of matches designed tominimize the number of bits output, not necessarily the number of strings matched.

In some variants of LZ77, the token stream is itself compressed using a statisticaltechnique, which means the length of a token is not known a priori. However, other LZ77variants code the tokens using a scheme for which the length of a given token can becomputed in advance. In such a case it is computationally feasible to compute the globallyoptimum set of matches (we refer to this as the optimum parsing of the input).

The basic idea is as follows. At each step of the compression process, the number ofbits required by an optimum parsing of the input ending at the current position is known. Ifthe longest match available at this point has length m, then candidate optimum parsings foreach of the next m positions can be computed by adding the number of bits required for thecurrent position to the token lengths for each of the m possible prefixes of the longest match.These m values are compared pairwise to the current values for the next m locations, andfor each improved bit count, the new value and a pointer to the current location are stored.When the end of the input is reached the pointers are traced backwards from the final inputsymbol to compute the optimum parsing.

The Calgary Corpus was used as the test data. An implementation of LZSS which hasa maximum match length of 16, a dictionary of 4K symbols and token sizes known a prioriwas used as the base algorithm. Our algorithm reduced the average compression ratio from45.28% to 42.64%, a (relative) improvement of better than 5.8%.

� Author’s current address is 10 Wilson Blvd, Halifax, NS, B3M 3E4� This work was partially supported by the Natural Sciences and Engineering Research

Council.

2010 Data Compression Conference

DOI 10.1109/DCC.2010.67

[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

Documents

doppler and compression british dermatology conference...

viscom2016 conference snowbird, utah · congressional...

41st annual cardiovascular conference at snowbird · allied...

alta/snowbird freeride and snowbird freeride 2013-2014...

2015 annual conference snowbird, utah august 2 7, 2015 lisa...

richfield snowbird festival · 1 to 4pm sunday, january 26...

subsea compression technology - subsea uk, … - akso... ·...

snowbird conference center banquet menus...

north american neuro-ophthalmology society 39th annual...

computing research news research news, february 2016 vol. 28...

cannonsburg ski patrol 2009 nsp patroller educaton...

hakvoort shipyard snowbird

statistics of real eigenvalues in ginoe spectra snowbird...

2015 annual conference snowbird, utah august 2 7, 2015...

snowbird ds13

introduction to endoscopy snowbird lectures utah, june 2006...

snowbird newsletter2016 proof

snowbird: interactive resource-intensive applications made...

snowbird resort snowbird, utah · : deanna torres, alan...

cra conference on grand research challenges in computer...