a low-latency consensus algorithm for geographically ...1 a low-latency consensus algorithm for...

Post on 19-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

ALow-latencyConsensusAlgorithmforGeographically

ReplicatedSitesMasterofScienceThesisDefense

BalajiArun

Committee:Dr.Binoy Ravindran (chair)Dr.Haibo ZengDr.RobertBroadwater

2

Agenda• Introduction• Motivation• ThesisContribution:CAESAR• Evaluation• Conclusion

3

Buildingonlineservicestoday…

4

DesiredProperties• Availability– FaultTolerance• Low-latency• High-throughput• StrongConsistency

5

DesiredProperties• Availabilityunderfaults• Low-latency• High-throughput• StrongConsistency

Replication

DistributedSystem

6

ReplicatingStateful Systems• Stateful systems• e.g.databases,in-memorycaches

• Requiresheavycoordinationamongnodes• tomaintainconsistency

• Expensiveinwidearea• duetohighlatencylinks

7

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

8

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

9

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

10

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

11

ThesisContribution

Availability

PartitionTolerance

Consistency Availability

PartitionTolerance

ConsistencyUnderfaults*

*onlywhenmajorityofnodesinthesystemfail

12

StateMachineReplication[F.B.Schneider1990]• Executecommands inthesameorderinallreplicas.

C1

C2 C3

13

StateMachineReplication• Executecommandsinthesameorderinallreplicas

C2

C1

C3C2C2 C3 C3

C1C1

Consensus

14

Paxos [Lamport 2001]

• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C2 C3

Letmeownslot1

Letmeownslot1

Letmeownslot1

15

Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C2 C3

VoteOrange VoteBlueVoteBlue

16

Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C3

C2

17

Paxos variants• Onereplicadecides:• Multi-Paxos,FastPaxos

• Roundrobinapproach:• Mencius[Mao‘2008]

1 2 3 4 5 …

C1 C2 C3

1 2 3 4 5 …

C1 C2 C3C4 C5

18

GeneralizedConsensus[Lamport 2005]

• Orderonlynon-commutativecommands• e.g.samekeyinKey-Valuestore.

• EPaxos [Moraru ‘13],M2Paxos[Peluso ’16]• Bestperformanceundernoconflicts(FastDecision,1RTT)• Performancedegradesunderconflicts(SlowDecision,1+RTT)

• Thesiscontribution:CAESAR

19

üIntroductionüMotivation

CAESAR• Evaluation• Conclusion

20

Overview• ImplementsGeneralizedConsensus• Useslogicaltimestampingtoordercommands(likeMencius)• Exploitsquorumsbygatheringdependenciesforcommands(likeEPaxos)• Deploysanovelwait conditiontoboostfastdecisions• underconflicts

21

Systemmodel• Nodescommunicatethroughmessagepassing• Usestwoquorumtypes• ClassicQuorum: !" +1

• FastQuorum: #!$• Fourphases• FastPropose• SlowPropose• Retry• Stable

22

UniformReliableBroadcast

PROPOSE:

PROPOSE:C

STABLE:

STABLE:C

OK:C

OK:OK:

OK:C

p0

p1

p2

p3

p4c&

c& c&

c&

c&c&

c&

c&

c&

c

c

c

c

c

23

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{C}

STABLE:C|0|{}

OK:C|{}

OK:|{C}OK:|{}

OK:C|{}

p0

p1

p2

p3

p4 c&

c

c& c&c&

c&

c&

c&

c&

c&

c

c

c

c

CAESAR:BasicProtocol

24

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{C}

STABLE:C|0|{}

OK:C|{}

OK:|{C} OK:|{}

OK:C|{}

p0

p1

p2

p3

p4

WAIT

c&c& c&

c&

c&

c&

c&

c&

c&

c

c

c

c

c

CAESAR:WaitCondition(fastpath)

25

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{}

RETRY:C|5|{}

OK:C|{}

OK:|{} OK:|{}

NACK:C|{} OK:C|{}

STABLE:C|5|{}p0

p1

p2

p3

p4

WAIT

c&c&

c&

c&

c&

c&

c& c&

c&c&

c&

c&

c&

c

c

c

c

c

CAESAR:WaitCondition(slowpath)

26

üIntroductionüMotivationüTheContribution:Caesar

Evaluation• Conclusion

27

Implementation• CAESAR• ImplementedinJava8• AdoptedandmodifiedJPaxos

• usednetwork,messagingandstatemachineabstractions

• Competitors• EPaxos,M2Paxos,Mencius,Multi-Paxos publiclyavailable

• Benchmark• FullyreplicatedKey-Valuestorebenchmark

28

Deployment

Ireland

Frankfurt

Mumbai

Ohio

Virginia

• AmazonEC2• m4.2xlargeinstances(8vCPUs,32GBMemory)

29

Client-perceivedperformance

30

Latency

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Latency(msec)

Virginia

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ohio

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Frankfurt

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ireland

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Mumbai

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ireland

Caesar EPaxos M2Paxos

• Closed-looprequestinjection• 10clientspernode

Conflict%

31

Systemperformance

32

Throughput

0

10

20

30

40

50

0% 2% 10% 30% 50% 100%

Throug

hput(1000x

tps) EPaxos CaesarM2Paxos Multi-Paxos-IRMulti-Paxos-IN Mencius

0

100

200

300

400Throug

hput(1000x

tps)

• Openlooprequestinjection

NetworkBatchingEnabled

NetworkBatchingDisabled

17%

24%

45%

Conflict%

33

SlowPathsvsEPaxos

0% 20% 40% 60% 80%

100%

0% 2% 10% 30% 50% 100%

%ofSlowPaths EPaxos Caesar

1.64%0.4%

44.5%

14.7%6.7%2.0%

29.83%

50.0%

9.84%

100%

Conflict%

34

üIntroductionüMotivationüTheContribution:CaesarüEvaluation

Conclusion

35

Conclusion• CAESARprovidesallthedesiredpropertiesforbuildingtoday’sservices.

HighAvailabilityunder faultsStateMachineReplication andConsensus

Strongconsistency

Low-latency Fast Paths

High-throughput Generalizedconsensusandsimpleexecutionphase

36

FutureWork• CAESARcanexhibitalatencydistributionwithalargestandarddeviation• duetomultiplephasesandwaitcondition

• ThisisachallengeformeetingServiceLevelAgreement(SLAs)• Futureworkshouldminimizelargetaillatencies

37

Submission

SubmittedtoDSN2017

38

Sourcecode• Opensource@https://www.github.com/ibalajiarun/caesar

39

References• Lamport,Leslie,“Paxos madesimple,”ACMSigact News,2001.• I.Moraru,D.G.Andersen,andM.Kaminsky,“ThereisMoreConsensusinEgalitarianParliaments,”inProceedingsoftheTwenty-FourthACMSymposiumonOperatingSystemsPrinciples,ser.SOSP’13.ACM,2013,pp.358–372.

• Y.Mao,F.P.Junqueira,andK.Marzullo,“Mencius:BuildingEfficientReplicatedStateMachinesforWANs,”inProceedingsofthe8thUSENIXConferenceonOperatingSystemsDesignandImplementation,ser.OSDI’08.USENIXAssociation,2008,pp.369–384.

• L.Lamport,“GeneralizedConsensusandPaxos,”MicrosoftResearch,Tech.Rep.MSR- TR-2005-33,March2005.

• S.Peluso,A.Turcu,R.Palmieri,G.Losa,andB.Ravindran,“Makingfastconsensusgenerallyfaster,”inDSN,2016.

• F.B.Schneider,“Implementingfault-tolerantservicesusingthestatemachineapproach:Atutorial,”ACMComput.Surv.,vol.22,no.4,pp.299–319,Dec.1990.[Online].Available:http://doi.acm.org/10.1145/98163.98167

40

Thanksto:Myadvisor,Dr.BinoyRavindran,

Roberto,Sebastiano

Dr.Broadwater,Dr.Zeng

and

AllmyfellowSSRGians(presentandpast)

top related