angela mccarthy cp5080, sp1 2010. received: 14 august 2008 revised: 13 november 2008 written by...
DESCRIPTION
Author looking at XML compression techniques and launch a study ◦ Surveys each of the different compression techniques and compares advantages and disadvantages of each Data transmitted online is rather large ◦ XML usage is growing, thus a demand for efficient XML compression tools existsTRANSCRIPT
XML Compression Techniques:
Survey and ComparisonAngela McCarthy
CP5080, SP12010
Received: 14 August 2008 Revised: 13 November 2008 Written by Sherif Sakr of University of New South
Wales, Australia
eXtensible Markup Language (XML), standard for data representation over World Wide Web
Large document sizes, compression introduced to deal with issues
Paper provides survey over compression techniques
Overview
Author looking at XML compression techniques and launch a study◦ Surveys each of the different compression
techniques and compares advantages and disadvantages of each
Data transmitted online is rather large◦ XML usage is growing, thus a demand for efficient
XML compression tools exists
Introduction
Contributions made:◦ Comprehensive survey of XML compression
techniques◦ A rich XML corpus collected and constructed
Contains wide variety of XML data sources, natures and document sizes
◦ Detailed results examining performance and characteristics
◦ Work repeatable Webpage of study provides access to test files,
examined XML compressors and detailed results of study
Introduction
Each section goes through each of the classifications of compressors
General Text Compressors ◦ Treats XML as plain text, uses traditional text
compression techniques XML Conscious Compressors
◦ Takes advantage of awareness of XML files◦ Uses document structure to achieve better
compression rates
Classifications
Non-Queriable (Archival) XML Compressors◦ No queries can be processed over compressed
format◦ Focus to achieve highest compression ratio
Queriable XML Compressors◦ Queries can be processed over compressed
format◦ Compression ratio actually worse then archival
XML compressors◦ Focus to avoid full document decompression
during query execution
Classifications
Compressor Characteristics
XML Data Sets
Large variety of data sets (see previous)◦ From 0.5MB to 1.3GB◦ Four Categories
Structural Documents Textual Documents Regular Documents Irregular Documents
Testing Environments◦ To ensure consistency, two
different were environments used, high VS low
XML Testing Corpus
Performance Metrics measured and compared◦ Compression Ratio
Ratio between sizes of compressed and uncompressed Compression Ratio = (Compressed Size)/(Uncompressed
Size)◦ Compression Time
Elapsed time during compression process◦ Decompression Time
Elapsed time during decompression process The lower the metric value, the better the
compressor
Performance Metrics
11 XML Compressors Evaluated◦ Three general purpose text compressors
Gzip, bzip2, PPM◦ Eight XML conscious compressors
XMillGzip, XMillBzip, XMillPPM, XMLPPM, SCMPPM, XWRT, AXECHOP
◦ Compressors evaluated under default settings◦ Additional experiments run with tuned
parameters for highest level of compression paramters
◦ In total, 16 variant compressors
Framework
Ideally want to provide a global ranking on XML compression tools
Results show there is no clear winner◦ Dependant upon the weight of each metric
Three ranking functions◦ – WF1 = (1/3 ∗ CR)+(1/3 ∗ CT)+(1/3 ∗ DCT)◦ – WF2 = (1/2 ∗ CR)+(1/4 ∗ CT)+(1/4 ∗ DCT)◦ – WF3 = (3/5 ∗ CR)+(1/5 ∗ CT)+(1/5 ∗ DCT)
CR represents the compression ratio metric, CT represents the compression time metric and DCT represents the decompression time metric
Results
Compression Ratio
Compression Time
Decompression Time
Paper surveyed state-of-the-art XML compression techniques
Reported the behaviour of various different XML compressors using large corpus of XML documents
Paper could be valuable for ◦ Developers of new XML compression tools◦ Users for making an effective decision on most suitable
compressor for requirements Fig 7. Shows none of XML conscious compressors
has achieved outstanding compression ratio
Conclusions
Average Compression Ratios
Planning to continue maintaining and updating webpage of study with further evaluations
Enable visitors to perform online experiments using set of available compressors and own XML documents
Future Work
Large number of references◦ Due to different compression techniques used
Large amount of data Thorough in research methods
◦ Large amount of data tested◦ Tested on different systems◦ Tested using different techniques
Abbreviations/Acronyms given◦ Designed for specific audience
Paper seems to be a reference tool◦ User to read to help decide on which compression tool
to use
Metadata
Thanks for listening!
Questions?