deduplication school
TRANSCRIPT
-
8/6/2019 Deduplication School
1/61
Sponsor
ed by:
Deduplication School 2010Presentation DownloadSince 2009, there have been a number of changes and advancements in the storage environment.Data deduplication was the # 1 storage technology being evaluated by storage professionals last year.
This presentation download will explain how to leverage data deduplication technology to benefityour organization.
Download to learn the answers to the following questions:
How do recent major acquisitions affect the options in the dedupe marketplace?
Is everyone doing dedupe now?
Are all dedupe products roughly equivalent, or are their advantages of certain approaches?
These questions and more will be answered by storage expert, W. Curtis Preston, in this DedupeSchool seminar presentation download.
http://www.emc.com/ -
8/6/2019 Deduplication School
2/61
Deduplication School 201
http://searchstorage.techtarget.com/Ded
W. Curtis Preston
Executive Editor, TechTarget
Founder/ CEO Truth in IT, Inc.
http://searchstorage.techtarget.com/DedupeSchoolhttp://searchstorage.techtarget.com/DedupeSchool -
8/6/2019 Deduplication School
3/61
A Little About Me When I started as backup guy at $35B com
1993:
Tape Drive: QIC 80 (80 MB capacity)
Tape Drive: Exabyte 8200 (2.5 GB & 256KB/ s)
Biggest Server: 4 GB (93), 100 GB (96)
Entire Data Center: 200 GB (93), 400 GB (96)
My TIVO now has 5 times the storage my data cente
Consulting in backup & recovery since 96
Author of OReillys Back up & Recover y &
Usin g SANs an d N ASW b t f B k C t l
-
8/6/2019 Deduplication School
4/61
A Little Bit About Truth in IT, Inc Inspired by Consumer Reports, but designe
No advertising, no partners = no need to
No huge consulting fees just to find out w hic
work and which ones dont work (such fees
at $10K and go all the way to $100K!)
Funded instead by $999 annual subscription
Private online community w ith written resea
results, podcasts of interviews w ith users of
and direct communication w ith real custome
-
8/6/2019 Deduplication School
5/61
Agenda Understanding Deduplication
Using Deduplication in Backup System
Using Data Reduction in P rimary Syst
Recent Backup Software Advancemen
Backing up Virtual Servers
Backups on a Budget
-
8/6/2019 Deduplication School
6/61
Session 1
Understanding Deduplicatio
-
8/6/2019 Deduplication School
7/61
Why Disk? First a little history
-
8/6/2019 Deduplication School
8/61
History of My World Part I When I joined the industry (1993)
Disks were 4 MB/ s, tapes were 256 KB/ s
Networks were 10 Mb shared
Seventeen years later (2010)
Disks are 70 MB/ s, tapes are 120 MB/ s Networks are 10 Gb sw itched
Changes in 17 years
17x increase in disk speed (luckily, RAID
E
-
8/6/2019 Deduplication School
9/61
More History Plan A: Stage to disk, spool to tape
Pioneered by IBM in 90s, w idely adopted in lat
Large, very fast virtual disk as caching mecha
Only need enough disk to hold one nights bac
Helps backups; does not help restores
Plan B: Backup to disk, leave on disk
AKA the early VTL craze
Helps backups and restores
Di k till t i t k thi
-
8/6/2019 Deduplication School
10/61
Plan C: Dedupe I ts perfect for traditional backup
Fulls backup the same data every day/ week/ m
Incrementals backup entire file when only one
Both backup file 100 times if its in 100 locatio
Databases are often backed up full every day
Tons of duplicate blocks!
Average actual reduction of 10:1 and higher
I ts not perfect for everythingP d t d d t
-
8/6/2019 Deduplication School
11/61
Naysayers Eliminate all but one copy?
No, just eliminate duplicates per location
What about hash collisions?
More on this later, but this is nothing but FUD
I f youre unconvinced, use a delta differential
Doesnt this have immutability conce
Everything that changes the format of the data
immutability concerns (e.g. sector-based stora
-
8/6/2019 Deduplication School
12/61
Is There a Plan D? Some pundits/ analysts think dedupe
target dedupe) is a band-aid, and w il
eventually be done away with via bac
software-based dedupe, delta-backup
Maybe this w ill happen in a 3-5 year maybe it wont. (In fact, some backu
companies w ill tell you they dont ne
stinking dedupe appliances.)
-
8/6/2019 Deduplication School
13/61
How Dedupe Works
-
8/6/2019 Deduplication School
14/61
Your Mileage WILL Vary You really can get 10x to 400x
I t depends on Frequency of full backups (more fulls = more
How much of a given incremental backup cont
of other files (multimedia generally doesnt ha Length of retention (longer retention = more d
Redundancy in single full backup (if your prod
Things that confuse dedupe
-
8/6/2019 Deduplication School
15/61
HowDo They Identify Duplicate D
Two very different methods
Chunking/ hashing
Asigra, EMC Avamar, Symantec PureDisk, C
Simpana
EMC Data Domain, Greenbytes, FalconStor V
NEC, Quantum DXi Delta differential
Exagrid, IBM P rotectier, Ocarina, SEPATON
Some systems may use a hybrid approach
-
8/6/2019 Deduplication School
16/61
Chunking/Hashing Method Slice all data into s e g m e n t s or c h u n k
Run chunk through hashing algorithm
Check hash value against all other ha
Chunk w ith identical hash value is dis
Will find redundant blocks between f
different file systems, even different
-
8/6/2019 Deduplication School
17/61
Delta Differential Method Correlate backups
Mathematical methods
Using metadata
Compare similar backups byte-by-byt
Example
Tonights backup of Exchange instance Elv isis
similar to last nights backup of Elvis
Tonights backup ofElv isis compared byte-by
-
8/6/2019 Deduplication School
18/61
Hashing & Delta Differential Hashing
Most used method w ith most mileage
Some concerned about hash collisions (more
Compares everything to everything, therefordedupe out of similar data in dissimilar datasproduction and test copy of same data)
Delta Differentials Faster than hashing
No concern about hash collisions
Only compares like backups, so w ill get no d
similar data in dissimilar datasets, but does dedupe on same data
-
8/6/2019 Deduplication School
19/61
Hash Collisions: The real numbHash Size Number of Hashes & Amount of Dat
Desired Probability (Assuming 8k c
10-15
128 bits (MD5) 8.2 1011 6.6 PB 8.2 1016
160 bits (SHA-1) 5.4 1016 432.5 EB 5.4 1021
10-15: Odds of single disk w riting incorrect data a
know ing it (Undetectable Bit Error Rate or UBER
With SHA-1, we have to w rite 6.6 PB to get those
10
-5
: Worst odds of a double-disk RAID5 failureWe have to write 1 371 181 YB to reach those od
-
8/6/2019 Deduplication School
20/61
WhereIs the Data Deduped?
Target Dedupe
Data is sent unmodified across LAN & deduped at ta
No LAN/ WAN benefits until you replicate target to t Cannot compress or encrypt before sending to targe
Source Dedupe
Redundant data is identified at backup client
Only new, unique data sent across LAN/ WAN
LAN/ WAN benefits, can back up remote/ mobile data
Allows for compression, encryption at source
H b id
-
8/6/2019 Deduplication School
21/61
Lets Make It More Complicated Standalone Target Dedupe
Dedupe appliance separate from backup softw a
Integrated Target Dedupe
Target dedupe from b/ u s/ w vendor that backs
Standalone Source Dedupe
Full dedupe solution that only does source ded
Integrated Source DedupeB k ft th t d d t li t (
-
8/6/2019 Deduplication School
22/61
Name That Dedupe Standalone Target Dedupe
Data Domain, Exagrid, Greenbytes, IBM, NEC,
SEPATON
Integrated Target Dedupe
Symantec NetBackup
Integrated Source Dedupe
Asigra, Symantec NetBackup
Standalone Source Dedupe
-
8/6/2019 Deduplication School
23/61
Multi-node Deduplication
AKA Global Deduplication
AKA Clustered Deduplication
-
8/6/2019 Deduplication School
24/61
What Were Not Talking About Remember hashing vs. delta differen
Delta compares like to like Hashing compares everything to ever
Somesales reps from s o m e companiedont have multi-node/ global dedupe
calling the latter global dedupe. I ts n
At a minimum this is honest confusio
-
8/6/2019 Deduplication School
25/61
Single-node/Local vs. Multi-node Assume a customer buys multiple no
dedupe system
Suppose, then, that they back up exa
same client to each of those multiple
I f the vendor fails to recognize the d
data and stores it multiple times, it h
node/ local dedupe
-
8/6/2019 Deduplication School
26/61
Doctor It Hurts When I Do This Single-node/ local dedupe vendors say then
that. Why would you do that?
They tell you to split up your datasets and se
dataset to only one appliance
Easy to do if Your dataset sizes never change
A given dataset never outgrows a node
Some single-node sales reps w ill point out th
-
8/6/2019 Deduplication School
27/61
Multi-node Is the Way to Go Especially for larger environments &
conscious environments that buy as t
With multi-node dedupe you can load
treat same as you would a large tape
Single-node dedupe pushes the vend
the crest of the CPU/ RAM wave
Multi-node vendors can ride behind t
-
8/6/2019 Deduplication School
28/61
Multi/Single Node Dedupe Vendo Multi-node/ global
EMC Avamar (12 nodes)
Exagrid (10 nodes)
NEC (55 nodes)
SEPATON (8 nodes)
Symantec PureDisk, NetBackup & Backup Exec
Diligent (2 nodes)
Single-node/ local (as of Mar 2010)EMC D t D i
-
8/6/2019 Deduplication School
29/61
When Is It Deduped?
AKA Inline or Post Process?
-
8/6/2019 Deduplication School
30/61
Get Out the Swords Wed have just as much luck trying to
these arguments
Apple vs Windows
Linux vs either of them
Linux vs FreeBSD
Vmw are vs the mainframe (the original hyperv
Cable modem vs DSL
Initial common sense leans to inline,
-
8/6/2019 Deduplication School
31/61
Whats the Difference? This only applies to target dedupe
Inline is synchronous dedupe Post-process is asynchronous dedupe
Both are deduping as the data is comthe device (w ith most products and c
The question is really where the dedu
reads the native data from If it read
-
8/6/2019 Deduplication School
32/61
Inline & Post-process: An I/O WalkthrStep IL Hash IL Delta PP Hash
Ingest (100%) RAM write RAM write Disk w rite
New segment RAM read RAM read Disk read
Old segment RAM read Disk read RAM read
Match (90%) Disk dele
No match(10%) Disk w rite Disk w rite
For every 100 GB an inline hash system writes 10 GB to d
For every 100 GB an inline delta system w rites 10 GB, re
from disk
For every 100 GB a post process hash system writes 100
GB, and deletes 90 GB from disk
-
8/6/2019 Deduplication School
33/61
The Chair Recognizes Inline
When youre done w ith backups, you
w ith dedupe
Backups begin replicating as soon as
The post-process vendors need a sta
The post-process vendors dont start
until a backup is done; that w ill make
take longer
-
8/6/2019 Deduplication School
34/61
The Chair Recognizes Post-proces
When backups are done, dedupe is almost d
Replication begins as soon as the first backu
We wait until a backup is done, not until al l
are done (unless you tell us to)
The staging area allows In itial backups to be faster
Allow s copies and recent restores to come from native data
Allows for staggered implementation of dedupe
Selectively dedupe only what makes sense
-
8/6/2019 Deduplication School
35/61
Inline & Post-process Vendors
Inline
EMC Data Domain
IBM Protectier
NEC HydraStor
Post-process Exagrid
Greenbytes
Quantum DXiSEPATON D lt t
-
8/6/2019 Deduplication School
36/61
How Does Replication Work?
Does replication use dedupe?
Can I replicate many-to-one, one-to-cascading replication?
I f deduping many to one, w ill it dedu
across those appliances?
Can I control what gets replicated an
(e.g. production vs development)
-
8/6/2019 Deduplication School
37/61
Is There an Index?
What happens if the index is destroy
How do you protect against that?
Does it need its index to read the dat
What do you to verify data integrity?
What about malicious people? Some dedupe vendors arent very go
answering these questions, partially
they dont get them enough
-
8/6/2019 Deduplication School
38/61
Truth in IT Backup Concierge Ser Community of verified but anonymous end-use
(no vendors)
Included in base service:
Billable product & strategy-related questions
Learn from other customers questions & answ
Much less expensive than traditional consulting
Talk to real people using the products you are
Podcast interviews w ith end-users and though
Unbiased product briefings written by experts
-
8/6/2019 Deduplication School
39/61
Session Two
Using Deduplication in Backu
Systems
U i D R d i i P i
-
8/6/2019 Deduplication School
40/61
The Dedupe Tax AKA Rehydration
Essentially a read from very fragmen
Not all dedupe systems are equally areassembling Humpty Dumpty
Especially visible during tape copies &
of large systems (single stream perfo
Recent POC of three major vendors s
difference in performance!
-
8/6/2019 Deduplication School
41/61
Isnt It Cheaper Just to
Buy tape?
Tape is cheaper than ever & keeps getting che
Must encrypt if youre using tape
Must use D2D2T to stream modern tape drives
Must constantly tweak to ensure youre doing
Take all that away and use dedupe
May not be cheaper but definitely better
Buy JBOD/ RAIDE if it f till h t it
-
8/6/2019 Deduplication School
42/61
Lets Talk About What Matters What are the risks of their approach?
Data integrity questions
How big is it? Whats m y dedupe ratio?
How big can it grow (local vs global)
How fast is it How fast can it backup/ restore/ copy m y data?
How fast is replication?
How much does it cost? Pricing schemes are all over the board
Try to get them on even playing field Also consider operational costs
-
8/6/2019 Deduplication School
43/61
Advanced Uses of Deduplicatio
-
8/6/2019 Deduplication School
44/61
Eliminate Tape Shipping
Offsite backups w/ oshipping tapes
Backups w ith no humanhands on them
Make tapes offsite fromreplicated copy and never
move them No tapes shipped = No
need to encrypt tapes
-
8/6/2019 Deduplication School
45/61
Shorter Recovery Point Objective
Most compan
backups once
Even though their transact
throughout th
theyre only s
once per day
Dedupe and r
could get the
immediately
-
8/6/2019 Deduplication School
46/61
VMware Backup
One of the challenges w ith
typical VMware backup is
the I/ O load it places onthe server
Source dedupe can
perform an incremental-forever backup w ith a
much lower I/ O load
Could allow you to
-
8/6/2019 Deduplication School
47/61
ROBO & Laptop Backups Dedupe software can
protect even the largestlaptops over the Internet
I t can also protectrelatively large remotesites w ithout installinghardware
Restores can be donelocally (for slower RTOs)or locally using a localrecovery server (for
quicker RTOs)
-
8/6/2019 Deduplication School
48/61
Where to Use Target/Source Ded
Laptops, Vmware, Hyper-V are easy: its got
Small, remote sets of data also an easy deci
do target w/ remote backup server, but cost
pushes people to source.
A medium-sized (
-
8/6/2019 Deduplication School
49/61
Source Dedupe: Remote Backup
I f using source dedupe to backup a r
office, should you back up directly to
centralized backup server or backup
backup server that replicates to a cen
server?
I ts all about the RTO you need.
Decide on RTO, test totally remote d if it t it
-
8/6/2019 Deduplication School
50/61
How Big is Too Big to Replicate B
Remote office replicating to a CO, or
replicating its backups to a DR site, t
limit to how much you can replicate
Make sure youve done all you can to
deduplication ratio. A 10:1 site w ill nas much bandw idth as a 20:1 site.
Depends on daily deduplicated changhi h i f t f d t t d d
-
8/6/2019 Deduplication School
51/61
Test, Test, Test!!!
-
8/6/2019 Deduplication School
52/61
Test Everything Installation and configuration, including adding additional capacity
Support call and ask stupid questions
Dedupe ratio
Must use your data
Must use your retention settings
Must fill up the system
All speeds
Backup speed
Copy speed extremely important to test
Restore speed
Aggregate performance With all your data types
Especially true if using local dedupe
Single stream performance
Backup speed
Restore and copy speed (especially if going to tape)
Replication Performance
-
8/6/2019 Deduplication School
53/61
Testing Methods: Source Dedupe
Must install on all data types you pla
up
Must task the system to the point tha
to use it VMware anyone?
OK to back up many redundant syste
kind of the point
Remember to test speed of copy to ta
-
8/6/2019 Deduplication School
54/61
Testing Methods: Target Dedupe
Copy production backups into IDT/ VT
your backup softwares built-in
cloning/ migration/ dupe features
Use dedicated drives if possible and s
run 24x7
You must fill up the system, expire so
then add more data to see steady sta
-
8/6/2019 Deduplication School
55/61
Data Reduction in Primary Sto
-
8/6/2019 Deduplication School
56/61
A Whole New Ball Game
In primary space, we use the term d a ta r ed u
more inclusive than dedupe
A very different access pattern; latency is m
important
The standard in backup w orld is tape: just d
slower than that and youre OK
The standard in primary world is disk: anyth
slow it down w ill kill the project
-
8/6/2019 Deduplication School
57/61
Options
Compression
File-level dedupe
Sub-file-level dedupe
Some files compress, but dont dedup
Some files dedupe but dont compres
-
8/6/2019 Deduplication School
58/61
Vendors
Compression
Storw ize, Ocarina
File-level dedupe
EMC Celerra
Sub-file-level dedupe
NetApp ASIS, Ocarina, Greenbytes, Exar/ Hifn,
Usually you get compression or dedu
-
8/6/2019 Deduplication School
59/61
Pros/Cons of Primary Data Reduc
Saves disk space, power/ cooling
Can have positive or negative impact
performance must test to see whic
Does not usually help backups: data
before being read by any app, includi
Exception to above rule is NetApp Sn
tape
-
8/6/2019 Deduplication School
60/61
Contact Me
Email [email protected]
Websites to which I contribute:
http:/ / www.backupcentral.com
http:/ / www.searchstorage.com
http:/ / www.searchdatabackup.com
Follow me on Tw itter @wcpreston
My upcoming venture:
mailto:[email protected]://www.backupcentral.com/http://www.searchstorage.com/http://www.searchdatabackup.com/http://www.truthinit.com/http://www.searchdatabackup.com/http://www.searchstorage.com/http://www.backupcentral.com/mailto:[email protected] -
8/6/2019 Deduplication School
61/61
The ROI of Backup Redesign Using Deduplication: An EMC Data Domain UserResearch Study H19
IDC Executive Guide: Assess the Value of Deduplication for your StorageConsolidation Initiatives
Why CIOs Should Look To Data Deduplication
RESOURCES FROM OUR SPONSOR
http://www.bitpipe.com/detail/RES/1268767835_717.htmlhttp://www.bitpipe.com/detail/RES/1268767835_717.htmlhttp://www.bitpipe.com/detail/RES/1267815600_941.htmlhttp://www.bitpipe.com/detail/RES/1267815600_941.htmlhttp://www.bitpipe.com/detail/RES/1254259141_188.htmlhttp://www.bitpipe.com/detail/RES/1268767835_717.htmlhttp://www.bitpipe.com/detail/RES/1268767835_717.htmlhttp://www.bitpipe.com/detail/RES/1267815600_941.htmlhttp://www.bitpipe.com/detail/RES/1267815600_941.htmlhttp://www.bitpipe.com/detail/RES/1254259141_188.htmlhttp://www.emc.com/