techniques for managing huge data lisa10
DESCRIPTION
Slides from the USENIX LISA10 Tutorial on Techniques for Managing Huge DataTRANSCRIPT
![Page 1: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/1.jpg)
USENIX LISA10 November 7, 2010
Techniques for Handling Huge Storage
USENIX LISA’10 ConferenceNovember 8, 2010
Sunday, November 7, 2010
![Page 2: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/2.jpg)
USENIX LISA10 November 7, 2010
AgendaHow did we get here?When good data goes badCapacity, planning, and design What comes next?
2
Note: this tutorial uses live demos, slides not so much
Sunday, November 7, 2010
![Page 3: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/3.jpg)
3
History
Sunday, November 7, 2010
![Page 4: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/4.jpg)
USENIX LISA10 November 7, 2010
Milestones in Tape Evolution
4
1951 - magnetic tape for data storage1964 - 9 track1972 - Quarter Inch Cartridge (QIC)1977 - Commodore Datasette1984 - IBM 34801989 - DDS/DAT1995 - IBM 35902000 - T99402000 - LTO2006 - T100002008 - TS1130
Sunday, November 7, 2010
![Page 5: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/5.jpg)
USENIX LISA10 November 7, 2010
Milestones in Disk Evolution
5
1954 - hard disk invented1950s - Solid state disk invented1981 - Shugart Associates System Interface (SASI)1984 - Personal Computer Advanced Technology (PC/AT)Attachment,
later shortened to ATA1986 - “Small” Computer System Interface (SCSI)1986 - Integrated Drive Electronics (IDE)1994 - EIDE1994 - Fibre Channel (FC)1995 - Flash-based SSDs2001 - Serial ATA (SATA)2005 - Serial Attached SCSI (SAS)
Sunday, November 7, 2010
![Page 6: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/6.jpg)
USENIX LISA10 November 7, 2010
Architectural ChangesSimple, parallel interfacesSerial interfacesAggregated serial interfaces
6
Sunday, November 7, 2010
![Page 7: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/7.jpg)
7
When Good Data Goes Bad
Sunday, November 7, 2010
![Page 8: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/8.jpg)
USENIX LISA10 November 7, 2010
Failure RatesMean Time Between Failures (MTBF)
Statistical interarrival error rate Often cited in literature and data sheetsMTBF = total operating hours / total number of failures
Annualized Failure Rate (AFR)AFR = operating hours per year / MTBFExpressed as a percentExample
MTBF = 1,200,000 hoursYear = 24 x 365 = 8,760 hoursAFR = 8,760 / 1,200,000 = 0.0073 = 0.73%
AFR is easier to grok than MTBF
8
Operating hours per year is a flexible definition
Sunday, November 7, 2010
![Page 9: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/9.jpg)
USENIX LISA10 November 7, 2010
Multiple Systems and Statistics
Consider 100 systems each with an MTBF = 1,000 hoursAt time=1,000 hours, 100 failures occurredNot all systems will see one failure
9
0
10
20
30
40
0 1 2 3 4
Num
ber o
f Sys
tem
s
Number of Failures
Very, Very Unlucky
Unlucky
Very Unlucky
Sunday, November 7, 2010
![Page 10: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/10.jpg)
USENIX LISA10 November 7, 2010
Failure RatesMTBF is a summary metric
Manufacturers estimate MTBF by stressing many units for short periods of qualification time
Summary metrics hide useful informationExample: mortality study
Study mortality of children aged 5-14 during 1996-1998Measured 20.8 per 100,000MTBF = 4,807 yearsCurrent world average life expectancy is 67.2 years
For large populations, such as huge disk farms, the summary MTBF can appear constant
Better question to be answered, “is my failure rate increasing or decreasing?”
10
Sunday, November 7, 2010
![Page 11: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/11.jpg)
USENIX LISA10 November 7, 2010
Why Do We Care?Summary statistics, like MTBF or AFR, can me misleading or risky if
we do not also distinguish between stable and trending processesWe need to analyze the ordered times between failure in relationship
to the system age to describe system reliability
11
Sunday, November 7, 2010
![Page 12: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/12.jpg)
USENIX LISA10 November 7, 2010
Time Dependent ReliabilityUseful for repairable systems
System can be repaired to satisfactory operation by any actionFailures occur sequentially in time
Measure the age of the components of a systemNeed to distinguish age from interarrival times (time between
failures)Doesn’t have to be precise, resolution of weeks works okSome devices report Power On Hours (POH)
SMART for disksOSesClerical solutions or inventory asset systems work fine
12
Sunday, November 7, 2010
![Page 13: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/13.jpg)
USENIX LISA10 November 7, 2010
TDR Example 1
13
0
5
10
15
20
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
Mea
n C
umul
ativ
e Fa
ilure
s
System Age (months)
Disk Set ADisk Set BDisk Set CTarget MTBF
Sunday, November 7, 2010
![Page 14: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/14.jpg)
USENIX LISA10 November 7, 2010
TDR Example 2
14
Did a common event occur?
0
5
10
15
20
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
Mea
n C
umul
ativ
e Fa
ilure
s
System Age (months)
Disk Set ADisk Set BDisk Set CTarget MTBF
Sunday, November 7, 2010
![Page 15: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/15.jpg)
USENIX LISA10 November 7, 2010
TDR Example 2.5
15
0
5
10
15
20
Jan 1, 2010 May 14, 2011 Sep 23, 2012 Feb 3, 2014
Mea
n C
umul
ativ
e Fa
ilure
s
Date
Sunday, November 7, 2010
![Page 16: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/16.jpg)
USENIX LISA10 November 7, 2010
Long Term StorageNear-line disk systems for backup
Access time and bandwith advantages over tapeEnterprise-class tape for backup and archival
15-30 years shelf lifeSignificant ECC
Read error rate: 1e-20Enterprise-class HDD read error rate: 1e-15
16
Sunday, November 7, 2010
![Page 17: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/17.jpg)
USENIX LISA10 November 7, 2010
Reliability
17
Reliability is time dependentTDR analysis reveals trendsUse cumulative plots, mean cumulative plots, and recurrance ratesGraphs are goodTrack failures and downtime by system versus age and calendar datesCorelate anomalous behaviorManage retirement, refresh, preventative processes using real data
Sunday, November 7, 2010
![Page 18: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/18.jpg)
18
Data Sheets
Sunday, November 7, 2010
![Page 19: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/19.jpg)
USENIX LISA10 November 7, 2010
Reading Data SheetsManufacturers publish useful data sheets and product guidesReliability information
MTBF or AFRUER, or equivalentWarranty
PerformanceInterface bandwidthSustained bandwidth (aka internal or media bandwidth)Average rotational delay or rpm (HDD)Average response or seek timeNative sector size
EnvironmentalsPower
19
AFR operating hours per year can be a footnote
Sunday, November 7, 2010
![Page 20: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/20.jpg)
20
Availability
Sunday, November 7, 2010
![Page 21: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/21.jpg)
USENIX LISA10 November 7, 2010
Nines MatterIs the Internet up?
21
Sunday, November 7, 2010
![Page 22: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/22.jpg)
USENIX LISA10 November 7, 2010
Nines MatterIs the Internet up?Is the Internet down?
22
Sunday, November 7, 2010
![Page 23: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/23.jpg)
USENIX LISA10 November 7, 2010
Nines MatterIs the Internet up?Is the Internet down?Is the Internet reliability 5-9’s?
23
Sunday, November 7, 2010
![Page 24: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/24.jpg)
USENIX LISA10 November 7, 2010
Nines Don’t MatterIs the Internet up?Is the Internet down?Is the Internet’s reliability 5-9’s?Do 5-9’s matter?
24
Sunday, November 7, 2010
![Page 25: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/25.jpg)
USENIX LISA10 November 7, 2010
Reliability Matters!Is the Internet up?Is the Internet down?Is the Internet’s reliability 5-9’s?Do 5-9’s matter?Reliability matters!
25
Sunday, November 7, 2010
![Page 26: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/26.jpg)
USENIX LISA10 November 7, 2010
Designing for FailureChange design perspectiveDesign to success
How to make it work?What you learned in school: solve the equationCan be difficult...
Design for failureHow to make it work when everything breaks?What you learned in the army: win the warCan be difficult... at first...
26
Sunday, November 7, 2010
![Page 27: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/27.jpg)
USENIX LISA10 November 7, 2010
HA-Cluster plugin
Example: Design for Success
x86 ServerNexentaStor
Shared Storage
Shared Storage
x86 ServerNexentaStor
FCSASiSCSI
Sunday, November 7, 2010
![Page 28: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/28.jpg)
USENIX LISA10 November 7, 2010
Designing for FailureApplication-level replicationHard to implement - coding required
Some activity in open communityHard to apply to general purpose computing
ExamplesDoD, Google, Facebook, Amazon, ...The big guys
Tends to scale well with sizeMultiple copies of data
28
Sunday, November 7, 2010
![Page 29: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/29.jpg)
USENIX LISA10 November 7, 2010
Reliability - AvailabilityReliability trumps availability
If disks didn’t break, RAID would not existIf servers didn’t break, HA cluster would not exist
Reliability measured in probabilitiesAvailability measured in nines
29
Sunday, November 7, 2010
![Page 30: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/30.jpg)
30
Data Retention
Sunday, November 7, 2010
![Page 31: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/31.jpg)
USENIX LISA10 November 7, 2010
Evaluating Data RetentionMTTDL = Mean Time To Data LossNote: MTBF is not constant in the real world, but keeps math simpleMTTDL[1] is a simple MTTDL modelNo parity (single vdev, striping, RAID-0)
MTTDL[1] = MTBF / NSingle Parity (mirror, RAIDZ, RAID-1, RAID-5)
MTTDL[1] = MTBF2 / (N * (N-1) * MTTR)Double Parity (3-way mirror, RAIDZ2, RAID-6)
MTTDL[1] = MTBF3 / (N * (N-1) * (N-2) * MTTR2)Triple Parity (4-way mirror, RAIDZ3)
MTTDL[1] = MTBF4 / (N * (N-1) * (N-2) * (N-3) * MTTR3)
31
Sunday, November 7, 2010
![Page 32: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/32.jpg)
USENIX LISA10 November 7, 2010
Another MTTDL ModelMTTDL[1] model doesn't take into account unrecoverable readBut unrecoverable reads (UER) are becoming the dominant failure
modeUER specifed as errors per bits readMore bits = higher probability of loss per vdev
MTTDL[2] model considers UER
32
Sunday, November 7, 2010
![Page 33: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/33.jpg)
USENIX LISA10 November 7, 2010
Why Worry about UER?Richard's study
3,684 hosts with 12,204 LUNs11.5% of all LUNs reported read errors
Bairavasundaram et.al. FAST08www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf1.53M LUNs over 41 monthsRAID reconstruction discovers 8% of checksum mismatches“For some drive models as many as 4% of drives develop
checksum mismatches during the 17 months examined”Manufacturers trade UER for space
33
Sunday, November 7, 2010
![Page 34: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/34.jpg)
USENIX LISA10 November 7, 2010
Why Worry about UER?
RAID array study
34
Sunday, November 7, 2010
![Page 35: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/35.jpg)
USENIX LISA10 November 7, 2010
Why Worry about UER?
RAID array study
35
UnrecoverableReads
Disk Disappeared“disk pull”
“Disk pull” tests aren’t very useful
Sunday, November 7, 2010
![Page 36: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/36.jpg)
USENIX LISA10 November 7, 2010
MTTDL[2] ModelProbability that a reconstruction will fail
Precon_fail = (N-1) * size / UERModel doesn't work for non-parity schemes
single vdev, striping, RAID-0Single Parity (mirror, RAIDZ, RAID-1, RAID-5)
MTTDL[2] = MTBF / (N * Precon_fail)Double Parity (3-way mirror, RAIDZ2, RAID-6)
MTTDL[2] = MTBF2/ (N * (N-1) * MTTR * Precon_fail)Triple Parity (4-way mirror, RAIDZ3)
MTTDL[2] = MTBF3/ (N * (N-1) * (N-2) * MTTR2 * Precon_fail)
36
Sunday, November 7, 2010
![Page 37: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/37.jpg)
USENIX LISA10 November 7, 2010
Practical View of MTTDL[1]
37
Sunday, November 7, 2010
![Page 38: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/38.jpg)
USENIX LISA10 November 7, 2010
MTTDL[1] Comparison
38
Sunday, November 7, 2010
![Page 39: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/39.jpg)
USENIX LISA10 November 7, 2010
MTTDL Models: Mirror
39
Spares are not always better...
Sunday, November 7, 2010
![Page 40: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/40.jpg)
USENIX LISA10 November 7, 2010
MTTDL Models: RAIDZ2
40
Sunday, November 7, 2010
![Page 41: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/41.jpg)
USENIX LISA10 November 7, 2010
Space, Dependability, and Performance
41
Sunday, November 7, 2010
![Page 42: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/42.jpg)
USENIX LISA10 November 7, 2010
Dependability Use CaseCustomer has 15+ TB of read-mostly data16-slot, 3.5” drive chassis2 TB HDDsOption 1: one raidz2 set
24 TB available space12 data2 parity2 hot spares, 48 hour disk replacement time
MTTDL[1] = 1,790,000 yearsOption 2: two raidz2 sets
24 TB available space (each set)6 data2 parityno hot spares
MTTDL[1] = 7,450,000 years
42
Sunday, November 7, 2010
![Page 43: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/43.jpg)
USENIX LISA10 November 7, 2010
Planning for Spares Number of systems Need for sparesHow many spares do you need?How often do you plan replacements?
Replacing devices immediately becomes impracticalNot replacing devices increases risk, but how much?There is no black/white answer, it depends...
43
Sunday, November 7, 2010
![Page 44: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/44.jpg)
USENIX LISA10 November 7, 2010
SparesOptimizer Demo
44
Sunday, November 7, 2010
![Page 45: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/45.jpg)
Capacity, Planning, and Design
45
Sunday, November 7, 2010
![Page 46: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/46.jpg)
USENIX LISA10 November 7, 201046
SpaceSpace is a poor sizing metric, really!Technology marketing heavily pushes space
Maximizing space can mean compromising performance AND reliability
As disks and tapes get bigger, they don’t get better$150 rulePHB’s get all excited about space
Most current capacity planning tools manage by space
Sunday, November 7, 2010
![Page 47: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/47.jpg)
USENIX LISA10 November 7, 2010
BandwidthBandwidth constraints in modern systems are rareOverprovisioning for bandwidth is relatively simpleWhere to gain bandwidth can be tricky
Link aggregationEthernetSAS
MPIOAdding parallelism beyond 2 trades off reliability
47
Sunday, November 7, 2010
![Page 48: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/48.jpg)
USENIX LISA10 November 7, 2010
LatencyLower latency == better performanceLatency != IOPS
IOPS also achieved with parallelismParallelism only delivers latency when latency is constrained by
bandwidthLatency = access time + transfer timeHDD
Access time limited by seek and rotateTransfer time usually limited by media or internal bandwidth
SSDAccess time limited by architecture more than cTransfer time limited by architecture and interface
TapeAccess time measured in seconds
48
Sunday, November 7, 2010
![Page 49: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/49.jpg)
49
Deduplication
Sunday, November 7, 2010
![Page 50: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/50.jpg)
USENIX LISA10 November 7, 2010
What is Deduplication?A $2.1 Billion feature2009 buzzword of the yearTechnique for improving storage space efficiency
Trades big I/Os for small I/OsDoes not eliminate I/O
Implementation stylesoffline or post processing
data written to nonvolatile storageprocess comes along later and dedupes dataexample: tape archive dedup
inlinedata is deduped as it is being allocated to nonvolatile storageexample: ZFS
50
Sunday, November 7, 2010
![Page 51: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/51.jpg)
USENIX LISA10 November 7, 2010
Dedup how-toGiven a bunch of dataFind data that is duplicatedBuild a lookup table of references to dataReplace duplicate data with a pointer to the entry in the lookup tableGrainularity
fileblockbyte
51
Sunday, November 7, 2010
![Page 52: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/52.jpg)
USENIX LISA10 November 7, 2010
Dedup ConstraintsSize of the deduplication tableQuality of the checksums
Collisions happenAll possible permutations of N bits cannot be stored in N/10 bitsChecksums can be evaluated by probability of collisionsMultiple checksums can be used, but gains are marginal
Compression algorithms can work against deduplicationDedup before or after compression?
52
Sunday, November 7, 2010
![Page 53: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/53.jpg)
USENIX LISA10 November 7, 2010
Verification
add reference
checksum
compress
DDT entry lookup
write()
read data
data match?
new entry
yes
no
verify?
yes
no
yes
noDDT
match?
53
Sunday, November 7, 2010
![Page 54: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/54.jpg)
USENIX LISA10 November 7, 2010
Reference Counts
54
Eggs courtesy of Richard’s chickens
Sunday, November 7, 2010
![Page 55: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/55.jpg)
55
Replication
Sunday, November 7, 2010
![Page 56: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/56.jpg)
USENIX LISA10 November 7, 2010
Replication Services
Recovery Point Objective
System I/O Performance
Text
Days
Seconds
Slower Faster
Mirror
Application Level
Replication
Block ReplicationDRBD, SNDR
Object-level syncDatabases, ZFS
File-level syncrsync
Traditional Backup NDMP, tar
Hours
56
Sunday, November 7, 2010
![Page 57: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/57.jpg)
USENIX LISA10 November 7, 2010
How Many Copies Do You Need?Answer: at least one, more is better...One production, one backupOne production, one near-line, one backupOne production, one near-line, one backup, one at DR siteOne production, one near-line, one backup, one at DR site, one
archived in a vaultRAID doesn’t countConsider 3 to 4 as a minimum for important data
57
Sunday, November 7, 2010
![Page 58: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/58.jpg)
USENIX LISA10 November 7, 2010
Tiering Example
58
Big, honkingdisk array
Big, honkingtape library
File-basedbackup
Works great, but...
Sunday, November 7, 2010
![Page 59: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/59.jpg)
USENIX LISA10 November 7, 2010
Tiering Example
59
Big, honkingdisk array
Big, honkingtape library
File-basedbackup
... backups never complete
10 million files1 million daily changes
12 hourbackup window
Sunday, November 7, 2010
![Page 60: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/60.jpg)
USENIX LISA10 November 7, 2010
Tiering Example
60
Big, honkingdisk array
Big, honkingtape library
Near-linebackup
Backups to near-line storage and tape have different policies
10 million files1 million daily changes
weeklybackup window
hourly block-levelreplication
Sunday, November 7, 2010
![Page 61: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/61.jpg)
USENIX LISA10 November 7, 2010
Tiering Example
61
Big, honkingdisk array
Big, honkingtape library
Near-linebackup
Quick file restoration possible
Sunday, November 7, 2010
![Page 62: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/62.jpg)
USENIX LISA10 November 7, 2010
Application-Level Replication Example
62
Site 2
Long-termarchive option
Site 1
Data stored atdifferent sites
Site 3
Application
Sunday, November 7, 2010
![Page 63: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/63.jpg)
63
Data Sheets
Sunday, November 7, 2010
![Page 64: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/64.jpg)
USENIX LISA10 November 7, 2010
Reading Data Sheets ReduxManufacturers publish useful data sheets and product guidesReliability information
MTBF or AFRUER, or equivalentWarranty
PerformanceInterface bandwidthSustained bandwidth (aka internal or media bandwidth)Average rotational delay or rpm (HDD)Average response or seek timeNative sector size
EnvironmentalsPower
64
AFR operating hours per year can be a footnote
Sunday, November 7, 2010
![Page 65: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/65.jpg)
65
Summary
Sunday, November 7, 2010
![Page 66: Techniques for Managing Huge Data LISA10](https://reader035.vdocument.in/reader035/viewer/2022081404/558d1b30d8b42a93258b4588/html5/thumbnails/66.jpg)
USENIX LISA10 November 7, 2010
Key Points
66
You will need many copies of your data, get used to itThe cost/byte decreases faster than kicking old habitsReplication is a good thing, use oftenTiering is a good thing, use often
Beware of designing for success, design for failure, tooReliability trumps availabilitySpace, dependability, performance: pick two
Sunday, November 7, 2010