benefits of hardware data compression in storage networks - snia

45
Benefits of Hardware Data Compression in Storage Networks Gerry Simmons, Comtech AHA Tony Summers, Comtech AHA

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Benefits of Hardware Data Compression in Storage Networks - SNIA

Benefits of Hardware Data Compression in Storage Networks

Gerry Simmons, Comtech AHATony Summers, Comtech AHA

Page 2: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 2Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced without modificationThe SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee.

Page 3: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 3Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Abstract

Benefits of Hardware Data Compression in Storage NetworksThis tutorial explains the benefits and algorithmic details of lossless data compression in Storage Networks, and focuses especially on data de-duplication. The material presents a brief history and background of Data Compression - a primer on the different data compression algorithms in use today. This primer includes performance data on the specific compression algorithms, as well as performance on different data types. Participants will come away with a good understanding of where to place compression in the data path, and the benefits to be gained by such placement. The tutorial will discuss technological advances in compression and how they affect system level solutions.

Page 4: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 4Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression AlgorithmsSystem ImplementationsTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 5: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 5Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression AlgorithmsSystem ImplementationsTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 6: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 6Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Introduction, Why Compress?

Decrease file size and storage requirement(A compression ratio of 3:1 means the input file is three times the size of the compressed file)

Decrease data size and transfer over the network faster(A compression ratio of 3:1 means data transfers up to three times as quickly across the network)

Page 7: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 7Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

How to Use the 2:1 Compression Benefit

Potentially Expand Storage Capacity by 3x.

Retrieve or Store Data in much less time (up to 66% less.)

Reduce equipment and Power consumption (up to 66%.)(HVAC power consumption is typically equal to the

Equipment Power loading)

Page 8: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 8Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Lossless Data Compression, Background

Lossless versus lossy compressionLossless compression means that no information is lost when a file is compressed and then uncompressedLossy compression usually results in better compression ratio, but some information (e.g. resolution) is lost

There are many algorithms and data types. The best solution is to classify files and match the data type to the correct algorithm.

Page 9: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 9Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

File Types and Lossless Algorithms

File Type AlgorithmASCII LZ basedGrayscale Image JPEG2000 LosslessRGBColor JPEG2000 LosslessAudio Real Player Lossless, Apple

Lossless

Data that has been previously compressed will typically expand if an attempt is made to compress it again.

Page 10: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 10Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Lossless Data Compresssion, Background

LZ1(LZ77), LZ2(LZ78), were invented by two Computer Scientists:

Abraham LempelJacob ZivThey published papers in 1977 and 1978 describing two similar compression algorithms.

LZ1, is the basis for GZIP, PKZIP, WINZIP, ALDC, LZS and PNG among others.LZ2 is the basis for LZW and DCLZ. LZW was introduced in 1984 by Terry Welch who added refinements to LZ2 . It is used in TIFF files (LZW).

Page 11: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 11Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Lossless Data Compresssion, Background

Late 1980s DCLZ (LZ2 based), hardware implementation developed by Hewlett Packard and used a 4K linked list Dictionary with SRAM and hashing.

Early 1990s the first hardware implementation of an LZ compression algorithm using Content Addressable Memory (CAM), DCLZ.

Late 1990s the Sliding Window based LZ1 devices were becoming popular in tape backup systems and communications applications.

Page 12: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 12Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and Background

Lossless Compression AlgorithmsSystem ImplementationTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 13: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 13Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

LZ1-Based Algorithms

ALDC, LZS, and Deflate are LZ1 based algorithmsDeflate is the algorithm in GZIP, PKZIP, WINZIP, and PNGALDC, LZS, and Deflate Architecture consists of:

LZ1 function to identify matches in a sliding window history bufferPost Coder to Huffman encode the matches (length and offset), and literals (uncompressed Bytes).

ALDC, LZS, and Deflate differences:Sliding window history buffer sizeStatic Huffman encodingDeflate adds Dynamic Huffman and raw Byte encoding

Page 14: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 14Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

LZ1 Architecture

The String Matcher searches the history buffer to find repeating strings of Bytes

The Sliding Window History Buffer adds one new Byte and drops off one Byte from the back end of the history buffer each time a Byte is input and processed

The Post Coder is a prefix encoder. It can be Static Huffman or Dynamic Huffman. It uses statistics to encode the most common string matches with a smaller number of bits.

Page 15: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 15Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

LZ1 Algorithm, Sliding Window

ALDC Huffman encodes the Length of String in Bytes. Deflate (GZIP) Huffman encodes Literals, String Matches, and Offset Pointers.

Current byte to be processed

Up to 32K byte sliding window

2 1 0Size - 1

Page 16: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 16Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Example: LZ1 String Matching

Input String: ABCDABCFCDAB…..

Input OutputA AB BC CD D

ABC Distance=4, Length=3F F

CDAB Distance=6, Length=4

Page 17: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 17Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Example 2: Huffman Encoder

Probability of Occurrence

Input character ProbabilityA 0.25B 0.5C 0.125D 0.125

Page 18: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 18Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Example 2: Huffman Encoder

Symbol Code Pr

A 10 0.25B 0 0.5C 110 0.125D 111 0.125

/\

0 1

/ \

B / \

0 1

/ \

A / \

0 1

/ \

C D

Reduction =½[0.25(2) + 0.5(1) + 0.125(3) + .125(3)]

= 0.875

Reduction in data size due to Huffman encoding.

Page 19: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 19Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Compression Ratio Performance, LZ1 based

Data dependentRandom data provides poor compression ratio performanceData with repeating Byte strings, 2 Bytes or longer provides greater compression ratio performanceCompression ratios greater than 100:1 are possibleMay expand if attempting to compress previously compressed data,but a system could detect this and send the original data

Page 20: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 20Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Compression Ratio Performance, LZ1 based

Algorithm dependentSize of sliding windowStatic or dynamic Huffman encodingNumber of matches trackedLength of matches the algorithm will search for

Page 21: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 21Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

GZIP advantages

Open standard algorithm – no software license required.

Industry StandardGZIP is embedded in most Internet BrowsersCommonly used in Un*x Operating Systems

Software for compression or decompression is commonly available.

Better compression ratio performance than other hardware implemented LZ based algorithms used today.

Page 22: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 22Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

GZIP Software

Compression levelsLevel 1, 2 and 3 supports static HuffmanLevel 4-9 supports dynamic Huffman

Each level has limits on:– Number of matches it will track– Length of matches it will search for

Lower levels better for higher throughputHigher levels for better compression ratio performance

Page 23: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 23Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

0

0.5

1

1.5

2

2.5

3

3.5

ALDC LZS GZIP-1 GZIPCoprocessor

GZIP-9

Com

pres

sion

Rat

io

Compression Ratios,Calgary Corpus

Page 24: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 24Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

ALDC LZS GZIP-1 GZIPCoprocessor

GZIP-9

Com

pres

sion

Rat

io

Compression Ratios,Canterbury Corpus Corpus

Page 25: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 25Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

0

1

2

3

4

5

6

ALDC LZS GZIP-1 GZIPCoprocessor

GZIP-9

Com

pres

sion

Rat

io

Compression Ratios,HTML Data

Page 26: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 26Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Hardware versus Software

Higher data rate throughput (10x). CPU can offload the compression task, frees up valuable CPU bandwidth, and can reduce power consumption.Speed up a network link by sending shorter filesIf choosing GZIP, must evaluate the compressor configuration since there are many options that may or may not be supported.

Page 27: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 27Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression Algorithms

System ImplementationsTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 28: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 28Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Implementing Compression in the SAN

SAN Storage

Network

NAS ApplianceUsers

NAS Appliance

o

o

o Compression

(Most critical for storage gain)

Page 29: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 29Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Implementing Compression in the NAS

SAN Storage

Compression

Network

NAS ApplianceUsers

NAS Appliance

o

o

o

Compression

(Bandwidth gain)

(Bandwidth gain)

Page 30: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 30Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Implementing Compression in Back-up Servers (with Data De-Duplication)

SAN Storage

Compression

Network

Backup Appliance

Users

o

o

o

Compression

(Bandwidth gain)

(Bandwidth gain)

De-Dup Appliance

Compression

(Storage gain)

Users

o

o

oBackup

Appliance

Page 31: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 31Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Data De-Duplication in WAN Optimization

Network

WAN Appliance

Users

o

o

o WAN Appliance

Users

o

o

o

Compression

(Bandwidth & Storage

gain)

Compression

(Bandwidth & Storage

gain)

Page 32: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 32Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Data De-duplication and Data Compression

Reduces “macro” redundancies (e.g. attached files, images, signatures, etc.)Can get to very high Compression Ratios (40:1)VERY Large Data Dictionary (TeraBytes!)Uses Lossless Data Compression Algorithms for

storage of duplicate data (Data Dictionary) and remaining “meta” data

Page 33: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 33Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Implementing Compression Hardware

Install Compression boardInstall Device Driver and system libraries

May be more involved depending on where the compression function resides (i.e. Custom Driver for Appliance specific OS.)

System IssuesVarying Compressed File SizesVarying LatencyMultiple Compression Processors

Page 34: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 34Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression AlgorithmsSystem ImplementationsTechnology Advances and Compression Hardware Power Conservation and EfficiencyConclusion

Page 35: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 35Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Technology Advances and Compression Hardware

10G Ethernet Fiber Transceivers at 10GbpsPCI express, 4-lane, 8-lane, 16-laneScatter/Gather DMAUp to 8 Gigabit/s LZ1 compression boards

Page 36: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 36Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Other Algorithms

If the data is an image type with multiple bits per pixel

JPEG2000 in Lossless modeUses 5/3 Wavelet Transform

PNG uses Deflate with preprocessing

Page 37: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 37Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

JPEG2000 Comparison

Original Photograph: 69 MegaBytesTIFF LZW: 38 MegaBytesJPEG 2000 Lossless: 11.8 MegaBytes

Page 38: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 38Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression AlgorithmsSystem ImplementationsTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 39: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 39Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Going Green with GZIP Data Compression

Upgrading a data center from no compression to GZIP can result in up to 66% power reduction since you now need only 1/3 the equipment.

Additional Notes:

This assumes you want to apply all the gain to power savingsHVAC power savings is typically as great as the power loading from the equipment removed.Most Enterprise class systems cannot tolerate the speed of GZIP running in software.

Page 40: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 40Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Going Green

Implement GZIP Compression

0

33%

100%

No Compression GZIP

Pow

er C

onsu

mpt

ion

Page 41: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 41Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

GZIP COPROCESSOR TO CPU COMPARISON

GZIP COPROCESSOR ¼ OF A QUAD-CORE CPU RUNNING GZIP

SOFTWARE

POWER COSUMPTION,MAX DATA RATE OF ONE GIGABIT/SEC

GZIP COPROCESSOR = 1 WATT¼ OF A QUAD-CORE CPU = 25 WATTS

Page 42: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 42Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Agenda

Introduction and BackgroundLossless Compression AlgorithmsSystem ImplementationsTechnology Advances and Compression HardwarePower Conservation and EfficiencyConclusion

Page 43: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 43Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Conclusion

GZIP is the best performing LZ based hardware compression solution available.JPEG2000 Lossless is the best performing multi-bit image compression algorithm.Offloading Compression to a Coprocessor frees up valuable CPU bandwidth and saves power.Benefits of Compression:

Pack 2 to 3 times more data onto mass storage mediaSpeed up a communications link by 2x or 3x.Can reduce power consumption by as much as 66%.

Page 44: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 44Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

References

Welch, Terry A (1984)., A Technique For Performance Data Compression, IEEE Computer, vol. 17 no. 6 (June 1984)

Network Working Group, RFC 1951. DEFLATE Compressed Data Format Specification, May 1996

Keary, Major (1994). Data Compression [Electronic Version] Retrieved August 8, 2006http://www.melbpc.org.au/pcupdate/9407/9407article.htm

Milburn, Ken (2003).JPEG2000: The Killer Image File Format for Lossless Storage [Electronic Version] Retrieved August 18, 2006 http://www.oreillynet.com/pub/a/javascript/2003/11/14/digphoto_ckbk.html

Page 45: Benefits of Hardware Data Compression in Storage Networks - SNIA

Oct 17, 2007 45Benefits of Hardware Data Compression in Storage Networks © 2007 Storage Networking Industry Association. All Rights Reserved.

Q&A / Feedback

Please send any questions or comments on this presentation to SNIA: [email protected]

Many thanks to the following individuals for their contributions to this tutorial.

SNIA Education Committee

Sean Gettmann Rob PeglarNancy Clay Lynne VanArdsdaleBrandy Barton