distributed systems - university of cambridge€¢ vector clocks: list of lamport clocks, one per...

26
Distributed systems Lecture 5: Consistent cuts, process groups, and mutual exclusion Dr Robert N. M. Watson 1

Upload: vandung

Post on 08-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

DistributedsystemsLecture5:Consistentcuts,processgroups,andmutualexclusion

Dr RobertN.M.Watson

1

Page 2: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Lasttime• Sawphysicaltimecan’tbekeptexactlyinsync;insteaduse

logicalclocks totrackorderingbetweenevents:– Defineda→ b tomean‘a happens-beforeb’– Easyinsidesingleprocess,&usecausalordering

(send→ receive)toextendrelationacrossprocesses– ifsendi(m1)→ sendj(m2)thendeliverk(m1)→ deliverk(m2)

• Lamportclocks,L(e):aninteger– Incrementto(max of(sender,receiver))+1onreceipt– ButgivenL(a) <L(b),knownothingaboutorderofa andb

• Vectorclocks: listofLamportclocks,oneperprocess– ElementVi[j] captures#eventsatPj observedbyPi– Crucially: ifVi(a)<Vj(b),caninferthata→ b ,and

ifVi(a)~Vj(b),caninferthata~b

2

Page 3: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Vectorclocks:example

• WhenP2 receivesm1,itmerges theentriesfromP1’sclock– choosethemaximumvalueineachposition

• SimilarlywhenP3 receivesm2,itmergesinP2’sclock– thisincorporatesthechangesfromP1 thatP2 alreadysaw

• Vectorclocksexplicitly trackthetransitivecausalorder:f’stimestampcapturesthehistoryofa,b,c &d

3

P1

P2 physicaltime

P3

a b

e f

c d

(1,0,0)

m1

m2

(2,0,0)

(2,1,0) (2,2,0)

(0,0,1) (2,2,2)

Sendevent

Receiveevent

Page 4: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Consistentglobalstate• Wehavethenotionof“a happens-beforeb” (a→ b)or“a isconcurrentwithb”(a ~b)

• Whatabout‘instantaneous’system-widestate?– distributeddebugging,GC,deadlockdetection, ...

• Chandy/Lamport introducedconsistentcuts:– drawa(possiblywiggly)lineacrossallprocesses– thisisaconsistentcutifthesetofevents(onthelhs)isclosedunderthehappens-before relationship

– i.e.ifthecutincludeseventx,then italsoincludesalleventse whichhappenedbeforex

• Inpracticalterms,thismeanseverydeliveredmessageincludedinthecutwasalsosentwithinthecut

4

Page 5: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Consistentcuts:example

• Verticalcutsarealwaysconsistent(duetothewaywedrawthesediagrams),butsomecurvesareoktoo:– providingwedon’t includeanyreceiveeventswithouttheircorrespondingsendevents

• Intuitionisthataconsistentcutcould haveoccurredduringexecution(dependingonschedulingetc),

5

P1

P2 physicaltime

P3

a b

i l

f g

c d

e

k

h

j

Page 6: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Observingconsistentcuts• Chandy/Lamport SnapshotAlgorithm(1985)• Distributedalgorithmtogenerateasnapshot ofrelevant

system-widestate(e.g.allmemory,locksheld,…)• FloodaspecialmarkermessageM toallprocesses;causal

orderofflooddefinesthecut• IfPi receivesM fromPj andithasyettosnapshot:

– Itpausesallcommunication,takeslocalsnapshot&setsCij to{}– ThensendsM toallotherprocessesPk andstartsrecordingCik =

{setofallpostlocalsnapshotmessagesreceivedfromPk }• IfPi receivesM fromsomePk after takingsnapshot

– StopsrecordingCik,andsavesalongsidelocalsnapshot• Globalsnapshotcomprisesalllocalsnapshots&Cij• Assumesreliable, in-ordermessages,&nofailures

6Fearnot! Thisisnotexaminable.

Page 7: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Processgroups• Itisusefultobuilddistributedsystemswithprocessgroups

– Setofprocessesonsomenumberofmachines– Possibletomulticastmessagestoallmembers– Allowsfault-tolerantsystemsevenifsomeprocessesfail

• Membershipcanbefixed ordynamic– ifdynamic,haveexplicitjoin() andleave() primitives

• Groupscanbeopen orclosed:– Closedgroupsonlyallowmessagesfrommembers

• Internallycanbestructured(e.g.coordinatorandsetofslaves),orsymmetric(peer-to-peer)– Coordinatormakese.g.concurrentjoin/leaveeasier…– …butmayrequireextraworktoelect coordinator

7Whenweusemulticast indistributedsystems,wemeansomethingstronger

thanconventionalnetworkmulticastingusingdatagrams– donotconfusethem.

Page 8: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Groupcommunication:assumptions

• Assumewehaveabilitytosendamessagetomultiple(orall)membersofagroup– Don’tcareif‘true’multicast(singlepacketsent,received bymultiplerecipients) or“netcast”(sendsetofmessages,onetoeachrecipient)

• Assumealsothatmessagedeliveryisreliable,andthatmessagesarriveinboundedtime– Butmaytakedifferent amountsoftimetoreachdifferent recipients

• Assume(fornow)thatprocessesdon’tcrash• Whatdeliveryorderings canweenforce?

8

Page 9: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

FIFOordering

• WithFIFOordering,messagesfromaparticularprocessPi mustbereceivedatallotherprocessesPj intheordertheyweresent– e.g.intheabove,everyonemustseem1 beforem3– (orderingofm2 andm4 isnotconstrained)

• Seemseasybutnottrivialincaseofdelays/retransmissions– e.g.whatifmessagem1 toP2 takesaloooong time?

• Hencereceiversmayneedtobuffer messages toensureorder

9

P1

P2physicaltime

P4

m1

P3m2

m3

m4

?

Page 10: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Receivingversusdelivering• Groupcommunicationmiddlewareprovidesextrafeaturesabove‘basic’communication– e.g.providing reliability and/ororderingguaranteesontopofIPmulticastornetcast

• AssumethatOSprovidesreceive() primitive:– returnswithapacketwhenonearrivesonwire

• Receivedmessageseitherdeliveredorheldback:– Deliveredmeansinserted intodeliveryqueue– Heldback meansinserted intohold-back queue– held-backmessagesaredelivered laterastheresultofthereceipt ofanothermessage…

10

Page 11: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

ImplementingFIFOordering

• EachprocessPi maintainsamessagesequencenumber(SeqNo)Si• EverymessagesentbyPi includesSi,incrementedaftereachsend

– notincludingretransmissions!• Pj maintainsSji :theSeqNo ofthelastdelivered message fromPi

– IfreceivemessagefromPi withSeqNo ≠(Sji+1),holdback– WhenreceivemessagewithSeqNo =(Sji+1),deliver it…andalso

deliveranyconsecutivemessages inholdbackqueue…andupdateSji

11

deliveryqueue

hold-backqueue

receive(M from Pi) {s = SeqNo(M);if (s == (Sji+1) ) {

deliver(M); s = flush(hbq);Sji = s;

} else holdback(M);}

addM todeliveryQ

messagesconsumedbyapplication

heldbackmessagedelivered

Page 12: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Strongerorderings• CanalsoimplementFIFOorderingbyjustusingareliableFIFOtransportlikeTCP/IP

• Butthegeneral‘receiveversusdeliver’modelalsoallowsustoprovidestrongerorderings:– Causalordering:ifeventmulticast(g,m1)→multicast(g,m2),thenallprocesseswillseem1 beforem2

– Totalordering:ifanyprocessesdeliversamessagem1beforem2,thenallprocesseswilldeliverm1 beforem2

• CausalorderingimpliesFIFOordering,sinceanytwomulticastsbythesameprocessarerelatedby→

• Totalordering(asdefined)doesnotimplyFIFO(orcausal)ordering,justsaysthatallprocessesmustagree– OftenwantFIFO-total ordering(combinesthetwo)

12

Page 13: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Causalordering

• Sameexampleaspreviously,butnowcausalorderingmeansthat(a)everyonemustseem1 beforem3 (aswithFIFO),and(b)everyonemustseem1 beforem2 (duetohappens-before)

• Isthisok?– No!m1→ m2,butP2 seesm2 beforem1– Tobecorrect,mustholdback(delay)deliveryofm2 atP2– Buthowdoweknowthis?

13

P1

P2physicaltime

P4

m1

P3m2

m3

m4

Page 14: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Have(0,0,0)!=(1,0,2),somustholdbackm2 untilmissing

eventsseen

Oncem1received,candeliverm1 andthenm2

Implementingcausalordering• Turnsoutthisisprettyeasy!– StartwithreceivealgorithmforFIFOmulticast…– andreplacesequence numberswithvectorclocks

14

• Somecareneededwithdynamicgroups

P1

P2

m1

P3m2

→(1,0,0)

→(1,0,1)

→(2,0,2)

→(1,0,2)

→(1,1,0)

Page 15: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Totalordering• Sometimeswewantallprocessestoseeexactlythesame,FIFO,sequenceofmessages– particularlyforstatemachinereplication(seelater)

• Onewayistohavea‘cansend’token:– Tokenpassedround-robinbetweenprocesses– Onlyprocesswithtokencansend(ifhewants)

• Oruseadedicatedsequencerprocess– Otherprocessesaskforglobalsequenceno.(GSN),andthensendwiththisinpacket

– UseFIFOorderingalgorithm,butonGSNs• Canalsobuildnon-FIFO total-ordermulticastbyhavingprocessesgenerateGSNsthemselvesandresolvingties

15

Page 16: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Orderingandasynchrony• FIFOorderingallowsquitealotofasynchrony

– E.g.anyprocesscandelaysendingamessageuntilithasabatch(toimproveperformance)

– Orcanjusttoleratevariableand/orlongdelays• Causalorderingalsoallowssomeasynchrony

– Butmustbecarefulqueuesdon’tgrowtoolarge!• Traditionaltotalordermulticastnotsogood:

– Sinceeverymessagedeliverytransitivelydependsoneveryotherone,delaysholdsuptheentiresystem

– Insteadtendtoan(almost)synchronousmodel,butthisperformspoorly,particularlyoverthewidearea;-)

– Somecleverworkonvirtualsynchrony (fortheinterested)

16

Page 17: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Distributedmutualexclusion• Infirstpartofcourse,sawneedtocoordinateconcurrentprocesses/threads– Inparticularconsideredhowtoensuremutualexclusion:allowonly1threadinacriticalsection

• Avarietyofschemespossible:– test-and-set locks;semaphores;monitors;activeobjects

• Butmostoftheseultimatelyrelyonhardwaresupport(atomicoperations,ordisablinginterrupts…)– notavailableacrossanentiredistributedsystem

• Assumingwehavesomeshareddistributedresources,howcanweprovidemutualexclusioninthiscase?

17

Page 18: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Solution#1:centrallockserver

• NominateoneprocessCascoordinator– IfPi wantstoentercriticalsection,simplysends lockmessage to

C,andwaitsforareply– Ifresourcefree,CrepliestoPiwithagrantmessage;otherwise

CaddsPi toawaitqueue– Whenfinished,Pi sendsunlockmessage toC– Csendsgrantmessage tofirstprocessinwaitqueue

18

P1

P2 physicaltime

C

...executecriticalsection

Page 19: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Centrallockserver:prosandcons

• Centrallockserverhassomegoodproperties:– Simple tounderstandandverify– Live (providingdelaysarebounded,andnofailure)– Fair (ifqueue isfair,e.g.FIFO),andeasilysupportspriorities ifwewantthem

– Decentperformance:lockacquire takesoneround-trip,andreleaseis‘free’withasynchronousmessages

• ButCcanbecomeaperformancebottleneck…• …andcan’tdistinguishcrashofCfromlongwait– canaddadditionalmessages,atsomecost

19

Page 20: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Solution#2:tokenpassing

• Avoidcentralbottleneck• Arrangeprocessesinalogicalring

– Eachprocessknowsitspredecessor&successor– Singletokenpassescontinuouslyaroundring– Canonlyentercriticalsectionwhenpossesstoken;passtokenonwhenfinished(orifdon’tneedtoenterCS)

20

P0

P4P3

P1

P2

P5

Initial tokengeneratedbyP0 Passesclockwise

around‘ring’Ife.g.P4wantstoenterCS,holdsontotokenforduration

Page 21: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Tokenpassing:prosandcons• Severaladvantages:

– Simpletounderstand:only1processeverhastoken=>mutualexclusionguaranteed byconstruction

– Nocentralserverbottleneck– Livenessguaranteed(intheabsenceoffailure)– So-soperformance(between0andNmessagesuntilawaitingprocessenters,1messagetoleave)

• But:– Doesn’tguaranteefairness(FIFOorder)– Ifaprocesscrashesmustrepairring(routearound)– Andworse:mayneedtoregenerate token– tricky!

• Andconstantnetworktraffic:anadvantage???

21

Page 22: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Solution#3:totallyorderedmulticast

• SchemeduetoRicart &Agrawala (1981)• ConsiderN processes,whereeachprocessmaintainslocal

variablestate whichisoneof{FREE,WANT,HELD }• Toobtainlock,aprocessPi setsstate:= WANT,andthen

multicastslockrequesttoallotherprocesses• WhenaprocessPj receivesarequestfromPi:

– IfPj’s localstateisFREE,thenPj repliesimmediately withOK– IfPj’s localstateisHELD,Pj queuestherequesttoreplylater

• ArequestingprocessPiwaitsforOK fromN-1processes– Oncereceived,setsstate:= HELD,andenterscriticalsection– Oncedone,setsstate:= FREE,&repliestoanyqueuedrequests

• Whataboutconcurrentrequests?

22

Byconcurrentwemean:Pj isalreadyintheWANTstatewhenitreceivesarequestfromPi

Page 23: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Handlingconcurrentrequests• Needtodecideuponatotalorder:

– EachprocessesmaintainsaLamporttimestamp,Ti– ProcessesputcurrentTi intorequestmessage– Insufficientonitsown(recallthatLamporttimestampscanbeidentical)=>useprocessid(orsimilar)tobreakties

• HenceifaprocessPj receivesarequestfromPi andPjhasanoutstandingrequest(i.e.Pj’s localstateisWANT)– If(Tj,Pj)<(Ti,Pi)thenqueuerequest fromPi– Otherwise,replywithOK,andcontinuewaiting

• Notethatusingthetotalorderensurescorrectness,butnotfairness (i.e.noFIFOordering)– Q:canwefixthisbyusingvectorclocks?

23

Page 24: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Totallyorderedmulticast:example

• ImagineP1 andP2 simultaneously trytoacquirelock…– BothsetstatetoWANT,andbothsendmulticastmessage– Assumethattimestampsare17(forP1)and9(forP2)

• P3hasnointerest(stateisFREE),sorepliesOktoboth• Since9<17,P1 repliesOk;P2 staysquiet&queuesP1’srequest• P2 entersthecriticalsectionandexecutes…• …andwhendone,repliestoP1 (whocannowentercriticalsection)

24

P317 17

17

9

9 9

P2

P1 P3OK

P2

P1 P3

P2

P1

OKOK

OK

Page 25: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Additionaldetails• Completelyunstructureddecentralized solution...but:

– Lotsofmessages(1multicast+N-1 unicast)– Okformostrecentholdertore-enterCSwithoutanymessages

• Variantscheme(Lamport)-multicastfortotalordering– Toenter,processPi multicastsrequest(Pi,Ti) [sameasbefore]– Onreceiptofamessage,Pj replieswithanack(Pj,Tj)– Processeskeepallrequestsandacks inorderedqueue– IfprocessPi seeshisrequestisearliest,canenterCS…and

whendone,multicastsarelease(Pi,Ti)message– WhenPj receivesrelease,removesPi’srequestfromqueue– IfPj’s requestisnowearliest inqueue,canenterCS…

• BothRicart &Agrawala andLamport’s schemehaveNpointsoffailure:doomedifany processdies:-(

25

Page 26: Distributed systems - University of Cambridge€¢ Vector clocks: list of Lamport clocks, one per process ... , it merges the entries from P1’s clock ... • It is useful to build

Summary+nexttime• (More)vectorclocks• Consistentglobalstate+consistentcuts• Processgroupsandreliablemulticast• Implementingorder• Distributedmutualexclusion

• Leaderelectionsanddistributedconsensus• Distributedtransactionsandcommitprotocols• Replicationandconsistency

26