cs 110 computer architecture lecture 25 · lecture 25: dependability and raid instructor: ......
TRANSCRIPT
CS110ComputerArchitecture
Lecture25:DependabilityandRAID
Instructor:SörenSchwertfeger
http://shtech.org/courses/ca/
School of Information Science and Technology SIST
ShanghaiTech University
1Slides based on UC Berkley's CS61C
ReviewLastLecture• I/Ogivescomputerstheir5senses• I/Ospeedrangeis100-milliontoone• Pollingvs.Interrupts• DMAtoavoidwastingCPUtimeondatatransfers• Disksforpersistentstorage,replacedbyflash• Networks:computer-to-computerI/O– Protocolsuitesallownetworkingofheterogeneouscomponents.Abstraction!!!
2
ProtocolFamilyConcept
Message Message
TH Message TH Message TH TH
Actual Actual
Physical
Message TH Message THActual ActualLogical
Logical
3
Eachlowerlevelofstack“encapsulates”informationfromlayerabovebyaddingheaderandtrailer.
MostPopularProtocolforNetworkofNetworks
• TransmissionControlProtocol/InternetProtocol(TCP/IP)
• ThisprotocolfamilyisthebasisoftheInternet,aWAN(wideareanetwork)protocol• IPmakesbestefforttodeliver• Packetscanbelost,corrupted
• TCPguaranteesdelivery• TCP/IPsopopularitisusedevenwhencommunicatinglocally:evenacrosshomogeneousLAN(localareanetwork)
4
Message
TCP/IPpacket,Ethernetpacket,protocols
• Applicationsendsmessage
TCP data
TCP HeaderIP Header
IP DataEH
Ethernet Hdr
Ethernet Hdr• TCPbreaksinto64KiBsegments,adds20Bheader
• IPadds20Bheader,sendstonetwork• IfEthernet,brokeninto1500Bpacketswithheaders,trailers
5
GreatIdea#6:DependabilityviaRedundancy
• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail
6
1+1=2 1+1=2 1+1=1
1+1=22of3agree
FAIL!
Increasingtransistordensity reducesthecostofredundancy
GreatIdea#6:DependabilityviaRedundancy
• Appliestoeverythingfromdatacenterstomemory– Redundantdatacenterssothatcanlose1datacenterbutInternetservicestaysonline
– RedundantroutessocanlosenodesbutInternetdoesn’tfail– Redundantdiskssothatcanlose1diskbutnotlosedata(RedundantArraysofIndependentDisks/RAID)
– Redundantmemorybitsofsothatcanlose1bitbutnodata(ErrorCorrectingCode/ECCMemory)
7
Dependability
• Fault:failureofacomponent– Mayormaynotleadtosystemfailure
ServiceaccomplishmentServicedelivered
asspecified
ServiceinterruptionDeviationfromspecifiedservice
FailureRestoration
8
DependabilityviaRedundancy:Timevs.Space
• SpatialRedundancy– replicateddataorcheckinformationorhardwaretohandlehardandsoft(transient)failures
• TemporalRedundancy– redundancyintime(retry)tohandlesoft(transient)failures
9
DependabilityMeasures
• Reliability: MeanTimeToFailure(MTTF)• Serviceinterruption: MeanTimeToRepair(MTTR)• Meantimebetweenfailures(MTBF)– MTBF=MTTF+MTTR
• Availability=MTTF/(MTTF+MTTR)• ImprovingAvailability– IncreaseMTTF: Morereliablehardware/software+FaultTolerance
– ReduceMTTR:improvedtoolsandprocessesfordiagnosisandrepair
10
UnderstandingMTTF
11
ProbabilityofFailure
1
Time
UnderstandingMTTF
12
ProbabilityofFailure
1
TimeMTTF
1/3 2/3
AvailabilityMeasures
• Availability=MTTF/(MTTF+MTTR)as%– MTTF,MTBFusuallymeasuredinhours
• Sincehoperarelydown,shorthandis“numberof9sofavailabilityperyear”
• 1nine:90%=>36daysofrepair/year• 2nines:99%=>3.6daysofrepair/year• 3nines:99.9%=>526minutesofrepair/year• 4nines:99.99%=>53minutesofrepair/year• 5nines:99.999%=>5minutesofrepair/year
13
ReliabilityMeasures
• Anotherisaveragenumberoffailuresperyear:AnnualizedFailureRate(AFR)– E.g.,1000diskswith100,000hourMTTF– 365days*24hours=8760hours– (1000disks*8760hrs/year)/100,000=87.6faileddisksperyearonaverage
– 87.6/1000=8.76%annualfailurerate• Google’s2007study*foundthatactualAFRsforindividualdrivesrangedfrom1.7%forfirstyeardrivestoover8.6%forthree-yearolddrives
14
*research.google.com/archive/disk_failures.pdf
DependabilityDesignPrinciple
• DesignPrinciple:Nosinglepointsoffailure– “Chainisonlyasstrongasitsweakestlink”
• DependabilityCorollaryofAmdahl’sLaw– Doesn’tmatterhowdependableyoumakeoneportionofsystem
– Dependabilitylimitedbypartyoudonotimprove
15
Error Detection/CorrectionCodes• Memorysystemsgenerateerrors(accidentallyflipped-bits)– DRAMs storeverylittlechargeperbit– “Soft”errorsoccuroccasionallywhencellsarestruckbyalphaparticlesorotherenvironmentalupsets
– “Hard”errorscanoccurwhenchipspermanentlyfail– Problemgetsworseasmemoriesgetdenserandlarger
• MemoriesprotectedagainstfailureswithEDC/ECC• Extrabitsareaddedtoeachdata-word– Usedtodetectand/orcorrectfaultsinthememorysystem– Eachdatawordvalue mappedto uniquecodeword– Afaultchangesvalidcodewordto invalidone,whichcanbedetected
16
BlockCodePrinciples• Hammingdistance=differencein#ofbits• p =011011,q =001111,Ham.distance(p,q)=2• p=011011,q =110001,distance(p,q)=?
• Canthinkofextrabitsascreatingacodewiththedata
• Whatifminimumdistancebetweenmembersofcodeis2andgeta1-biterror? RichardHamming, 1915-98
TuringAwardWinner17
Parity:SimpleError-DetectionCoding• Eachdatavalue,beforeitis
writtentomemoryis“tagged”withanextrabittoforcethestoredwordtohaveevenparity:
• Eachword,asitisreadfrommemoryis“checked”byfindingitsparity(includingtheparitybit).
b7b6b5b4b3b2b1b0
+
b7b6b5b4b3b2b1b0p
+c• MinimumHammingdistanceofparitycodeis2
• Anon-zeroparitycheckindicatesanerroroccurred:– 2errors(ondifferentbits) arenotdetected– noranyevennumberoferrors,justoddnumbersoferrorsaredetected
18
p
ParityExample
• Data01010101• 4ones,evenparitynow• Writetomemory:010101010tokeepparityeven
• Data01010111• 5ones,oddparitynow• Writetomemory:010101111tomakeparityeven
• Readfrommemory010101010
• 4ones=>evenparity,sonoerror
• Readfrommemory110101010
• 5ones=>oddparity,soerror
• Whatiferrorinparitybit?
19
SupposeWanttoCorrect1Error?
• RichardHammingcameupwithsimpletounderstandmappingtoallowErrorCorrectionatminimumdistanceof3– Singleerrorcorrection,doubleerrordetection
• Called“HammingECC”– Workedweekendsonrelaycomputerwithunreliablecardreader,frustratedwithmanualrestarting
– Gotinterestedinerrorcorrection;published1950– R.W.Hamming,“ErrorDetectingandCorrectingCodes,”TheBellSystemTechnicalJournal,Vol.XXVI,No2(April1950)pp 147-160.
20
Detecting/CorrectingCodeConcept
• Detection:bitpatternfailscodewordcheck• Correction:maptonearestvalidcodeword
Spaceofpossiblebitpatterns(2N)
Sparsepopulationofcodewords(2M <<2N)- withidentifiablesignature
Errorchangesbitpatterntonon-code
21
HammingDistance:8codewords
22
HammingDistance2:DetectionDetectSingleBitErrors
23
• No1biterrorgoestoanothervalidcodeword• ½codewords arevalid
InvalidCodewords
HammingDistance3:CorrectionCorrectSingleBitErrors,DetectDoubleBitErrors
24
•No2biterrorgoestoanothervalidcodeword;1biterrornear• 1/4codewords arevalid
Nearest000
(one1)
Nearest111(one0)
Administrivia• FinalExam– Tuesday,June21,2016,9:00-11:00– Location:H2109+110– THREEcheatsheets(MT1,MT2,post-MT2)
• Hand-written• Project3willstillcome– Short/easy– but:– Competition:
• Slowest33percentileandbelow:80%• Fastestprogram:100%• Linearscalinginbetween.
– Time:Whatdoyouprefer?1weekonly,ortillendofexamweek?
25
HammingErrorCorrectionCode
• Useofextraparitybitstoallowthepositionidentificationofasingleerror
1. Markallbitpositionsthatarepowersof2asparitybits(positions1,2,4,8,16, …)– Startnumberingbitsat1atleft(notat0onright)
2.Allotherbitpositionsare databits(positions3,5,6,7,9,10,11,12,13,14,15, …)
3.Eachdatabitiscoveredby2ormoreparitybits
26
HammingECC4. Thepositionof paritybitdetermines sequence
of databitsthatitchecks• Bit1(00012):checksbits(1,3,5,7,9,11,...)– Bitswithleastsignificantbitofaddress=1
• Bit2(00102):checksbits(2,3,6,7,10,11,14,15,…)– Bitswith2nd leastsignificantbitofaddress=1
• Bit4(01002):checksbits(4-7,12-15,20-23,...)– Bitswith3rd leastsignificantbitofaddress=1
• Bit8(10002):checksbits(8-15,24-31,40-47,...)– Bitswith4th leastsignificantbitofaddress=1
27
GraphicofHammingCode
• http://en.wikipedia.org/wiki/Hamming_code
28
HammingECC5.Set paritybitstocreateevenparity foreachgroup
• Abyteofdata:10011010• Createthe codedword,leavingspacesfortheparitybits:
• __1_001_1010000000000111123456789012
• Calculatetheparitybits29
HammingECC• Position1checksbits1,3,5,7,9,11 (bold):? _1 _0 01 _1 01 0.setposition1toa _:_ _1 _0 01 _1 01 0
• Position2checksbits2,3,6,7,10,11 (bold):0?1_001 _101 0.setposition2toa _:0_ 1 _001 _101 0
• Position4checksbits4,5,6,7,12 (bold):011?001 _1010.setposition4toa _:011 _ 001_1010
• Position8checksbits8,9,10,11,12:0111001?1010.setposition8toa_:0111001_ 1010
30
HammingECC• Position1checksbits1,3,5,7,9,11:? _1 _0 01 _1 01 0.setposition1toa0:0 _1 _0 01 _1 01 0
• Position2checksbits2,3,6,7,10,11:0?1_001 _101 0.setposition2toa1:01 1 _001 _101 0
• Position4checksbits4,5,6,7,12:011?001 _1010.setposition4toa1:0111 001_1010
• Position8checksbits8,9,10,11,12:0111001?1010.setposition8toa0:01110010 1010
31
HammingECC• Finalcodeword:011100101010• Dataword: 10011010
32
HammingECCErrorCheck
• Supposereceive011100101110
0 1 1 1 0 0 1 0 1 1 1 0
33
HammingECCErrorCheck
• Supposereceive011100101110
34
HammingECCErrorCheck
• Supposereceive0111001011100 1 0 1 1 1 √11 01 11 X-Parity2inerror
1001 0 √01110 X-Parity8inerror
• Impliesposition8+2=10isinerror011100101110
35
HammingECCErrorCorrect
• Fliptheincorrectbit…011100101010
36
HammingECCErrorCorrect
• Supposereceive0111001010100 1 0 1 1 1 √11 01 01 √
1001 0 √01010 √
37
HammingErrorCorrectingCode• Overheadinvolvedinsingleerror-correctioncode• Letp be totalnumberofparitybitsand d numberofdatabitsin p +d bitword
• Ifp errorcorrectionbitsaretopointto errorbit(p +d cases)+indicatethatnoerrorexists(1case),weneed:
2p >=p +d +1,thusp >=log(p +d +1)forlarged,p approacheslog(d)
• 8bitsdata=> d =8,2p =p +8+1=>p =4• 16data=>5parity,32data=>6parity,64data=>7parity
38
HammingSingle-ErrorCorrection,Double-ErrorDetection(SEC/DED)
• Adding extraparitybitcoveringtheentireword providesdoubleerrordetectionaswellassingleerrorcorrection1 2 34 5678p1 p2 d1 p3 d2 d3 d4p4
• Hammingparitybits H (p1 p2 p3)arecomputed(evenparityasusual)plusthe evenparityovertheentireword, p4:H=0 p4=0,noerrorH≠0p4=1,correctablesingleerror(oddparityif1error=>p4=1)H≠0p4=0, doubleerroroccurred(evenparityif2errors=>p4=0)H=0 p4=1, singleerroroccurredinp4bit,notinrestofword
TypicalmoderncodesinDRAMmemorysystems:64-bitdatablocks(8bytes)with72-bitcodewords(9bytes).
39
HammingSingleErrorCorrection+DoubleErrorDetection
40
1biterror(one1)Nearest0000
1biterror(one0)Nearest1111
2biterror(two0s,two1s)
HalfwayBetweenBoth
HammingDistance=4
WhatifMoreThan2-BitErrors?
• Networktransmissions,disks,distributedstoragecommonfailuremodeisburstsofbiterrors,notjustoneortwobiterrors– ContiguoussequenceofB bitsinwhichfirst,lastandanynumberofintermediatebitsareinerror
– Causedbyimpulsenoiseorbyfadinginwireless– Effectisgreaterathigherdatarates
41
CyclicRedundancyCheck
42
Simpleexample:ParityCheckBlock
1001101001101100111100000010110111011100001111001111110000001100
00111011
Data
Check
00000000 0=Check!
1001101001101100111100000000000011011100001111001111110000001100
0011101100101101 Not0=Fail!
CyclicRedundancyCheck• Paritycodesnotpowerfulenoughtodetectlongrunsoferrors(alsoknownasbursterrors)
• BetterAlternative:Reed-SolomonCodes– UsedwidelyinCDs,DVDs,MagneticDisks– RS(255,223)with8-bitsymbols:eachcodeword contains255codewordbytes(223bytesaredataand32bytesareparity)
– Forthiscode:n=255,k=223,s=8,2t=32,t=16– Decodercancorrectanyerrorsinupto16bytesanywhereinthecodeword
43
CyclicRedundancyCheck11010011101100 000 <--- input right padded by 3 bits1011 <--- divisor01100011101100 000 <--- result 1011 <--- divisor00111011101100 000
101100010111101100 000
101100000001101100 000 <--- skip leading zeros
101100000000110100 000
101100000000011000 000
101100000000001110 000
101100000000000101 000
101 1-----------------00000000000000 100 <--- remainder
44
14databits 3checkbits 17bitstotal
3bitCRCusing thepolynomial x3 +x+1(divideby1011togetremainder)
CyclicRedundancyCheck
• Forblockofk bits,transmittergeneratesann-k bitframechecksequence
• Transmitsn bitsexactlydivisiblebysomenumber• Receiverdividesframebythatnumber– Ifnoremainder,assumenoerror– Easytocalculatedivisionforsomebinarynumberswithshiftregister
• Disksdetectandcorrectblocksof512byteswithcalledReedSolomoncodes≈CRC
45
(InMoreDepth:CodeTypes)• LinearCodes:
CodeisgeneratedbyGandinnull-space ofH• HammingCodes:DesigntheHmatrix
– d =3Þ Columnsnonzero,Distinct– d =4Þ Columnsnonzero,Distinct,Odd-weight
• Reed-solomon codes:– BasedonpolynomialsinGF(2k)(I.e.k-bitsymbols)– Dataascoefficients,codespaceasvaluesofpolynomial:– P(x)=a0+a1x1+…ak-1xk-1– Coded:P(0),P(1),P(2)….,P(n-1)– Canrecoverpolynomialaslongasgetany k ofn– Alternatively:aslongasnomorethann-k codedsymbols
erased,canrecoverdata.• Sidenote:MultiplicationbyconstantinGF(2k)canbe
representedbyk k matrix:a×x– Decomposeunknownvectorintok bits:x=x0+2x1+…+2k-1xk-1– Eachcolumnisresultofmultiplyingaby2i
This image This image
46
HammingECConyourown• TestiftheseHamming-codewordsarecorrect.Ifoneisincorrect,indicatethecorrectcodeword.Also,indicatewhattheoriginaldatawas.
• 110101100011• 111110001100• 000010001010
47
EvolutionoftheDiskDrive
48IBMRAMAC305,1956
IBM3390K,1986
AppleSCSI,1986
CansmallerdisksbeusedtoclosegapinperformancebetweendisksandCPUs?
ArraysofSmallDisks
49
14”10”5.25”3.5”
3.5”
DiskArray:1diskdesign
Conventional:4diskdesigns
LowEnd HighEnd
ReplaceSmallNumberofLargeDiskswithLargeNumberofSmallDisks!(1988Disks)
50
CapacityVolumePowerDataRateI/ORateMTTFCost
IBM3390K20GBytes97cu.ft.3KW15MB/s600I/Os/s250KHrs$250K
IBM3.5"0061320MBytes0.1cu.ft.11W1.5MB/s55I/Os/s50KHrs$2K
x7023GBytes11cu.ft.1KW120MB/s3900IOs/s???Hrs$150K
DiskArrayshavepotentialforlargedataandI/Orates,highMBpercu.ft.,highMBperKW,butwhataboutreliability?
9X3X
8X
6X
RAID:RedundantArraysof(Inexpensive)Disks
• Filesare"striped"acrossmultipledisks• Redundancyyieldshighdataavailability– Availability:servicestillprovidedtouser,evenifsomecomponentsfailed
• Diskswillstillfail• Contentsreconstructedfromdataredundantlystoredinthearray=>Capacitypenaltytostoreredundantinfo=>Bandwidthpenaltytoupdateredundantinfo
51
RedundantArraysofInexpensiveDisksRAID1:DiskMirroring/Shadowing
52
• Eachdiskisfullyduplicatedontoits“mirror”Veryhighavailabilitycanbeachieved
•Bandwidthsacrificeonwrite:Logicalwrite=twophysicalwritesReadsmaybeoptimized
•Mostexpensivesolution:100%capacityoverhead
recoverygroup
RedundantArrayofInexpensiveDisksRAID3:ParityDisk
53
P
100100111100110110010011...
logicalrecord 10100011
11001101
10100011
11001101
Pcontainssumofotherdisksperstripemod2(“parity”)Ifdiskfails,subtractPfromsumofotherdiskstofindmissinginformation
Stripedphysicalrecords
RedundantArraysofInexpensiveDisksRAID4:HighI/ORateParity
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddress
Stripe
Insidesof5disks
Example:smallreadD0&D5, largewriteD12-D15
54
InspirationforRAID5• RAID4workswellforsmallreads• Smallwrites(writetoonedisk):– Option1:readotherdatadisks,createnewsumandwritetoParityDisk
– Option2:sincePhasoldsum,compareolddatatonewdata,addthedifferencetoP
• SmallwritesarelimitedbyParityDisk:WritetoD0,D5bothalsowritetoPdisk
55
D0 D1 D2 D3 P
D4 D5 D6 PD7
RAID5:HighI/ORateInterleavedParity
56
Independentwritespossiblebecauseofinterleavedparity
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddresses
Example:writetoD0,D5usesdisks0,1,3,4
ProblemsofDiskArrays: SmallWrites
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
oldparity
XOR
XOR
(1.Read) (2.Read)
(3.Write) (4.Write)
RAID-5:SmallWriteAlgorithm
1LogicalWrite=2PhysicalReads+2PhysicalWrites
57
And,inConclusion,…• GreatIdea:RedundancytoGetDependability– Spatial(extrahardware)andTemporal(retryiferror)
• Reliability:MTTF&AnnualizedFailureRate(AFR)• Availability:%uptime(MTTF-MTTR/MTTF)• Memory– Hammingdistance2:ParityforSingleErrorDetect– Hammingdistance3:SingleErrorCorrectionCode+encodebitpositionoferror
• Treatdiskslikememory,exceptyouknowwhenadiskhasfailed—erasuremakesparityanErrorCorrectingCode
• RAID-2,-3,-4,-5:Interleaveddataandparity
58