understanding primary storage optimization options · secondary storage to deduplication for...
TRANSCRIPT
![Page 1: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/1.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Understanding Primary Storage Optimization Options
Jered FloydPermabit Technology Corp.
![Page 2: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/2.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Primary Storage Optimization
Technologies that let you store more data on the same storageThin provisioningCopy-on-write snapshotsCompressionDeduplication
2
![Page 3: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/3.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.Source: IDC Digital Universe Study, May 2010
“This explosive growth means that by 2020, our Digital Universe will be 44
TIMES AS BIG as it was in 2009”~ IDC
1 ZB = 1 Trillion Gigabytes
Data Growth is Accelerating
3
![Page 4: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/4.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Requirements for Primary Storage Optimization
To be broadly adopted, any technology must: Support block, file and unified architecturesHave no impact on performanceHave no impact on reliability Scale to support storage capacity deployed Implement within existing architecture
Older methods (e.g. thin provisioning) meet these requirements today
Newer methods are beginning to emerge
4
![Page 5: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/5.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Compression and Deduplication
For that reason, we’ll focus on compression and deduplication
Both have been in backup “forever”Compression – 40 yearsDeduplication – 10 years
Compression for primary storage, while available for many years, has never really been enabled
Deduplication is relatively new to primary storage
5
![Page 6: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/6.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
2011 is The Year of Primary Storage Optimization
One thing is clear: In 2011, the focus will shift from deduplication for nearline /
secondary storage to deduplication for primary storage.
~Dave Simpson
The other one that I think is really important, and we're just beginning to see this come out now, is data deduplication and compression
for primary storage.~Tony Asaro
Primary storage data reduction is back from our 2010 Hot Technologies list. In 2011, we'll see a lot more of primary data reduction in
shipping products.
In short customers want a single deduplication method that works across
platforms. It’s the only logical way to leverage deduplication so its full advantages can be
realized. ~George Crump
Storage Industry Consolidation And Deduplication
Hot technologies for 2011
Musings on the future of data dedupe Data storage trends 2011: Predictions of hot data storage technologies
6
![Page 7: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/7.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Compression
Techniques to reduce the size of stored dataLossy vs. losslessGeneric vs. content aware Identifying repeated bytes Identifying duplicate bytes Identifying similar bytes or objectsDiscarding irrelevant data
Operate on a single file / object at a time
7
![Page 8: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/8.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Compression Benefits
Mature, generic algorithms Big savings on low entropy data (e.g. text) Big savings on rich media Broad hardware support for specific technologies Low memory requirements
8
![Page 9: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/9.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Compression Challenges
High processor requirements No cross-object savings Complex licensing on media formats Always in the data read path Modifies byte stream on storage Savings constant regardless of data scale
9
![Page 10: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/10.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Deduplication
Conceptually a sort of compression… Techniques to eliminate data being stored Single-instance vs. sub-fileFixed block vs. variable blockGeneric vs. content awareAlways lossless
Operates across a file system, LUN, or entire storage pool
10
![Page 11: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/11.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Deduplication Benefits
Big savings on redundant data (e.g. VM, database) Lower CPU requirements than compression No impact on data read Underlying data isn’t modified Savings scale with more data stored
11
![Page 12: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/12.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Data Duplication Challenges
Higher memory requirements Limited applicability for media files No standardization for software implementations Limited scale in most solutions
12
![Page 13: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/13.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Deduplication: Backup vs. Primary Primary workflows require massive scale and high performance Backup data model allows for simplifications not applicable to primary storage
• Bloom filters are unfeasibly computationally expensive for random-access deletion• Differencing methods require larger blocks than primary dedupe allows• Buffering for large look-back window• Locality knowledge to individual sources• Large block similarities
Backup dedupe doesn’t adapt to primary storage use case
Backup Primary
Data Flow Stream-Oriented Random Access
Latency Critical No Yes
Typical Chunk Size 1 MB and up 4 KB to 64 KB
Index Lookups Thousands/sec Millions/sec
# Objects 100s Millions 100s Billions
Unique Data 1 to 60 TB 1 TB to PBs
13
![Page 14: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/14.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Gartner Priority Matrix for Storage Technologies
“Big Data” and Extreme Information Processing and Management
Data Deduplication
Enterprise-Grade Solid-State Drives
Thin Provisioning
Source: Gartner Hype Cycle for Storage Technologies, 2011 26 July 2011 | ID:G00214638
Deduplication identified as a “transformational” storage technology
14
![Page 15: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/15.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
IDC 2010 Trends in Storage and Virtualized Environments
IDC 2010 Trends in Storage and Virtualized Environments Survey Noemi Greyzdorf, Benjamin Woo Nov 2010 Doc #225059
55%
15
![Page 16: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/16.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Deduplication Impact
1 ZB = 1 Trillion Gigabytes
16
![Page 17: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/17.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Compression vs. Deduplication
Compression Deduplication
Impact on Read High None
Impact on Write High Moderate
Savings on VM Low High
Savings on Media High (some formats) Low
CPU Requirements High Moderate
Memory Requirements Low High
Scalability Unlimited Varies by Implementation
Impact on Reliability Moderate Low
17
![Page 18: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/18.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Compression identifies "micro" duplicates
Dedupe identifies "macro" duplicates
Dedupe then compress (easiest), or
Compress then dedupe (requires compressed format segmentation)
File Data
Segment and Dedupe
Compress
File Data
Dedupe
Segment and Compress
Compression and Deduplication
18
![Page 19: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/19.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Compression and Deduplication
2.3 1.6
21.3
12.1
35.1
18.1
VMware Database
Data
Red
uctio
n Ra
tio
Compression (e.g. LZ) identifies "micro" duplicates Deduplication identifies "macro" duplicates
6.4
1.1 1.4 1.51.1
2.5
1.5
6.76.8
2.82.1
10.1
Log files Office 2007 User Directories Exchange
Compression Only
Dedupe Only
Dedupe + Compression
19
![Page 20: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/20.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Primary Data Efficiency Impact
For 1PB Enterprise Primary environment
Conclusion: Compression is good Dedupe is better: >3x data reduction
over wider data set range Dedupe + Compression is best of breed
20
![Page 21: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/21.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Deploying Primary Optimization
Where does it run? Integrated into Storage Intermediary ApplianceHost Software
When InlinePost-processParallel
21
![Page 22: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/22.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Intermediary Appliance
Storage optimization runs on a separate hardware device All data passes through the appliance on read and write Benefits:
Brings storage optimization to legacy platforms
Challenges: Additional hardware expense Introduces bottleneck to all I/O
operations Appliance can mask functionality Failure can affect availability Data lock-in to optimization appliance technology
22
![Page 23: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/23.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Host Software
Storage optimization takes place on application host All data passes through the software on read and write Benefits:
Brings storage optimization to legacy platforms No additional hardware cost or complexity
Challenges Difficult to implement with shared storage Consumes host CPU and memory resources Data lock-in to specific optimization technology
23
![Page 24: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/24.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Primary Optimization
Primary Optimization
Deploying Deduplication
Primary Optimization
Primary Optimization
In Write Path, Out of Read Path
Out of Write Path, Out of Read Path
In Write Path, In Read Path
Out of Write Path, Out of Read Path
24
![Page 25: Understanding Primary Storage Optimization Options · secondary storage to deduplication for primary storage. ~Dave Simpson The other one that I think is really important, and we're](https://reader031.vdocument.in/reader031/viewer/2022011900/5f040e4d7e708231d40c198b/html5/thumbnails/25.jpg)
2011 Storage Developer Conference. © Permabit Technology Corporation. All Rights Reserved.
Conclusion
Data continues to grow exponentially Primary Storage Optimization technologies save you money
(This includes thin provisioning if you’re not doing that yet) Compression + Deduplication is best Different integration models have different tradeoffs
Cost Performance Data Savings Operational impact (availability, reliability, etc.)
In the end, optimization will move into the storage
25