part9 ch3 data replic
TRANSCRIPT
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 1/14
Page 1
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .1
FAULT TOLERANT SYSTEMS
ht t p: / / www. ecs. umass. edu/ ece/ kor en/ Fault Toler ant Syst ems
Par t 9 - Dat a Replicat ion
Chapt er 3 – I nf or mat ion Redundancy
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .2
Dat a Replicat ion f or Fault Tolerance♦I dent ical copies of dat a held at mult iple nodes in a
dist r ibut ed syst em
♦I mproved perf ormance and f ault - t olerance
♦Dat a replicat es must be kept consist ent despit ef ailur es in t he syst em
♦Example - f ive copies in f ive nodes:
♦I f A is disconnect ed and a write
updat es t he copy in A - t he r estno longer consist ent wit h A
♦Any read of t heir dat a will r esultin st ale dat a
♦How many copies should we read (write)?
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 2/14
Page 2
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .3
Simple Vot ing Scheme - Non Hierar chical
♦Assign v vot es t o copy i of t he dat a
♦S - set of all nodes wit h copies of t he dat a
♦V - sum of all vot es -
♦r , w - var iables such t hat r + w > V ; w > V/ 2
♦V(X) - t ot al number of vot es assigned t o copies inset X -
♦St r ategy ensur ing t hat all reads use t he lat est dat a
♦To complet e a read - r ead nodes of a set R ⊆ Ssuch t hat V(R) ≥ r
♦To complet e a write - wr it e on ever y node of a setW ⊆ S such t hat V(W) ≥ w
i
∑∈
=S i
iV V
∑∈
= X i
iV X V )(
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .4
Pr ocedur e J ust if icat ion♦A set R such t hat V(R) ≥ r is called a read
quorum
♦A set W such t hat V(W) ≥ w is called awr it e quorum
♦For any set s R and W such t hat V(R) ≥ r and
V(W) ≥ w R ∩ W ≠ ∅ (since r + w > v)
♦Any read oper at ion is guarant eed t o r eadt he value of at least one copy which hasbeen updat ed by t he lat est write
♦Fur t her mor e - f or any t wo set ssuch t hat
21, W W
wW V W V ≥)(),( 21
φ≠I 21 W W
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 3/14
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 4/14
Page 4
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .7
Per f or mance vs. Availabilit y
♦Select ed values of r and w af f ect t he syst emper f or mance and availabil it y
♦I f t her e ar e many mor e reads t han wr it es - wechoose a low r t o speed up t he read oper at ions
♦r =1 requires w=5 - wr it e can not be done if evenone node is disconnect ed
♦Select ing r =2 allows w=4 and write can st ill bedone if f our out of t he f ive nodes ar e connect ed
♦Tr ade- of f bet ween perf or mance and availabilit y
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .8
Reliabilit y and Availabilit yMarkov Models
♦Assumpt ions:∗ Failures occur at each node
accor ding t o a Poisson processwith rate λ (links do not f ail)
∗ When a node f ails, it is repairedand up- t o- dat e data is loaded
∗ Repair t ime is an exponent ially dist r ibut ed random var iable
wit h mean 1/ µ♦Example: (r,w) = (3,3) - bot h read and writeoperat ions can t ake place if at least three of t hef ive nodes are up
♦To comput e r eliabilit y and availabilit y, we useMar kov Chain models
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 5/14
Page 5
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .9
Reliabilit y and Long- Ter mAvailabilit y f or (r , w)=(3, 3)
♦Markov chain f orr eliabilit y:
♦St at e - number ofnodes down;F – t he f ai lur e st at e
♦Reliabilit y at t ime t
♦Markov chain f oravailabilit y:
♦St at e - number of
nodes down♦Long- Term Availabilit y =P0+P1+P2
♦Complet e analysis inExercises
)(1)( t Pt R F −=
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 10
Vot e Assignment f or Maximizing Availabili t y
♦I n general, nodes have dif f erent r eliabilit ies andavailabilit ies
♦Links can f ail as well
♦Point Availabilit y - t he probabi li t y t hat at t ime t t hesyst em is up - r ead and wr it e quor ums exist
♦Pr oblem: Assigning vot es t o nodes t o maximize pointavailabilit y
♦Opt imal assignment dif f icult - heur ist ics necessar y♦Notat ions: For some f ixed point in t ime t
(t is omit t ed f or simplicit y)
∗Point Availabili t y of node i - an(i)
∗Point Availabilit y of link j - al(j )
∗L(i) - set of links incident on node i
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 6/14
Page 6
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 11
Heur ist ics f or Vot e Assignment
♦Heur ist ic 1: set
♦(r ounded t o t he near est int eger)
♦I f sum of vot es is even, give an ext r a vot e t o oneof t he nodes wit h t he maximum number of vot es
♦Heur ist ic 2:
♦k(i, j) - node connect ed t o node i by link j ;set
♦
♦(r ounded t o t he near est int eger)
♦I f sum is even - an ext r a vot e is added as above
∑∈
=
)(
)()()(
i L j
jalianiV
)),(()()()()(
jik an jalianiV i L j
⋅+= ∑∈
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 12
Heur ist ic 1 - Example
♦Node A is unr eliable compared t ot he r est - get s no vot es
♦Sum of vot es is 3 - quor ums must sat isf yr + w > 3 ; w > 3/ 2 ⇒ w = 2 or 3
♦I f w=2 - r=2 is smallest read quorum♦Possible read (or write) quorums - BC, CD, BD
♦I f w=3 - r=1 is smallest read quorum
♦Possible read quorums - B, C, D
♦One write quorum: BCD
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 7/14
Page 7
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 13
Heur ist ic 2 - Example
♦Sum of vot es even - B get s an ext r a vot e
♦Final vot e assignment - v(A)=1, v(B)=3, v(C)=2, v(D)=1
♦Sum of vot es is 7 - read and write quorums mustsat isf y:
♦r + w > 7
♦w > 7/ 2
♦w = 4 or 5 or 6 or 7
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 14
Quor ums f or Heur ist ic 2♦Possible quorums f or r+w=8
♦Ever y (r,w) pair has an availabilit y associat ed wit h it- t he pr obabilit y t hat at least one read and one writequor um exist despit e node and/ or link f ailures
♦(r,w)=(4,4) - ident ical l ist s of read and write quorums
♦Ot her (r,w) - l ist s are di f f erent
v(A)=1, v(B)=3, v(C)=2, v(D)=1
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 8/14
Page 8
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 15
Calculat ing Availabilit y f or (r , w)=(4, 4)
♦Availabilit y A is t he pr obabilit y t hatat least one of t he quor ums
AB, BC, BD, ACD can be used♦Denot e event s: E1, E2, E3, E4 -
AB,BC, BD,ACD are up, r espect ively
♦Event s are not mut ually exclusive
♦A=P(E1∪E2∪E3∪E4)
♦Some calculat ions:P(E1)=P(AB is up)=0. 7x0. 7x0. 8=0. 392
P(E2 ∩E3)=P(BC and BD ar e up) =0. 8x0. 9x0. 9x0 . 2x0. 7=0. 091♦Exercise: Complet e t he calculat ion of t he availabil it y A
f or (r,w)=(4,4)
∑ ∩∩∩−∩∩∑ +∩=<<<
−∑k jii ji
i E E E E Pk
E j
E i
E P j
E i
E P E P )4321
()()(( )
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 16
Dif f er ent Met hod f or Availabilit yCalculat ion
♦Syst em has 8 modules (4 nodes and 4 links) - eachcan be up or down
♦Total of 2 =256 mut ually exclusive st at es
♦Pr obabilit y of each st at e is a pr oduct of 8 t erms,eit her an(i) or 1- an(i) or al(i) or 1- al(i)
♦Met hodical (but long) way of comput ing availabili t y -list all st at es and add up t he pr obabilit ies of t hosewhere a quorum exist s
♦For any ot her value of (r,w) - read and wr i t equor ums ar e dif f erent
♦Availabilit y – sum of probabili t ies of st at es in whichbot h r ead and wr it e quor ums exist
8
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 9/14
Page 9
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 17
Dynamic Vot e Assignment
♦I f r epair is not f ast enough - syst em can degr ade♦I f syst em degr ades enough - no connect ed cluster
wit h a maj or it y of t ot al vot es exist s
♦Solut ion - adj ust able quor ums inst ead of st at ic ones
♦Assumpt ion - each node has exact ly one vot e
♦For each dat a, ver sion numbers ar e maintained -incr ement ed wit h every update
♦This can only be execut ed if a wr it e quorum can begat her ed
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 18
Dynamic Vot e Assignment - Not at ions
♦ - version number of dat a at node i
♦ - updat e sit es car dinalit y at node i - numberof nodes which part icipat ed in t he - t h updat e oft his dat a
♦When syst em start s operat ion, is init ialized t ot he t ot al number of nodes in t he syst em
♦ - set of nodes wit h which node i cancommunicat e
♦M - maximum version number in
♦I - part ial set of wit h nodes whose versionnumber is M
♦N - maximum updat e sit es car dinalit y ( ) ofnodes in I
iVN
iSC
iSC
iVN
iS
iS
iS
iS
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 10/14
Page 10
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 19
Dynamic Vot e Assignment Algor it hm
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 20
Dynamic Vot e Assignment - Example
♦Seven nodes - same dat a - state at t ime t
♦Failure disconnect s int o t wo - {A,B,C,D} and {E, F,G}
♦E r eceives an updat e r equest
∗ - E must f ind more t han 7/ 2 nodes (including it self )
∗ can f ind only 3
∗ updat e request is reject ed
♦A r eceives an updat e r equest
∗ can be accept ed
∗ A,B,C,D are updat ed
♦New st at e
0
7= E
SC
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 11/14
Page 11
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 21
Example - Cont .
♦Anot her f ailur e - component s become{A,B,C}, {D}, {E,F,G}
♦An updat e request ar r ives at C
∗wr it e quorum at C is 3
∗updat e successf ul
♦New st at e
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 22
Vot ing - Hier ar chical Organizat ion
♦I f V is large, r+w is lar ge - dat a operat ions t ake along t ime
♦Possible solut ion - hier ar chical vot ing scheme
♦Const r uct an m- level t ree
♦All t he nodes holding copies of t he dat a ar e leavesat level m- 1
♦Add vir t ual nodes at t he higher levels up t o t her oot at level 0 - added nodes are vir t ual gr oupingsof t he real nodes
♦Each node at level i will have exact ly childr en1+i
L
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 12/14
Page 12
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 23
Example - a Tr ee f or Hier ar chicalQuor um Generat ion
33 21 === L Lm
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 24
Quor um Generat ion Algor it hm
♦Assign one vot e t o each node in t he t r ee
♦Set Read quorum and wr it e quor um sizes at level i,and so t hat :
♦Following algor it hm is used r ecur sively :
♦Read- mark t he r oot at level 0
♦At level 1 - r ead- mar k nodes
♦Pr oceeding f rom level i t o level i+1- r ead- mar k children of each of t he nodesr ead- marked at level i
♦You cannot r ead- mark a node which does not haveat least non- f ault y childr en
♦Pr oceed unt il i = m- 1
♦The r ead- mar ked leaves f or m a read quor um
♦For ming a wr it e- quorum is similar
2 / ; iiiii Lw Lwr >>+ir
iw
1+
i
r
1+ir
1r
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 13/14
Page 13
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 25
♦ f or i=1,2,
♦St ar t ing at t he root - read- mark t wo of i t schildren - say X and Y
♦Read- mark t wo children f or X and Y - say A, B -f or X, and D, E f or Y
♦Read quorum is t he set of r ead- marked leaves -A, B, D, E
Algor it hm - Example
2=iw 21=+−= iii w Lr
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 26
Example - cont .
♦Suppose D is f ault y - cannot be par t of t heread quorum
♦We have t o pick anot her child of Y - say F -t o be in t he r ead quorum
♦I f t wo of Y' s children ar e f ault y - we cannotr ead- mar k Y - we have t o backt r ack and t r yr ead- marking Z inst ead
♦Exercise: List read quorums generat ed by
2,2,3,1 2211 ==== wr wr
7/31/2019 Part9 Ch3 Data Replic
http://slidepdf.com/reader/full/part9-ch3-data-replic 14/14
P 14
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 27
Hierar chical vs. Non- Hier ar chicalAppr oach
♦Read quorum consist s of j ust 4 copies
♦Similar ly, we can have a wr it e quorum with 4 copies
♦For t he non- hier archical approach wit h one vot e pernode, r + w > 9 ; w > 9/ 2
♦w is at least 5, compared t o 4 in t he t r ee approach
♦To pr ove t hat t he hierarchical appr oach wor ks, weshow t hat every possible read quorum has t o int ersectevery possible wr it e quorum in at least one node
Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 28
Pr imar y Backup Approach
♦A node is designated as t he pr imar y - allaccesses ar e t hr ough t hat node
♦Ot her nodes are designat ed as backups
♦Under normal oper at ion - all wr it es t o t hepr imary are also copied t o t he f unct ional
backups♦When t he pr imar y f ails - one of t he backup
nodes is chosen t o t ake it s place