information-agnostic flow scheduling for commodity...

46
Kai Chen SING Group, CSE Department, HKUST May 16, 2016 @ Stanford University Information-Agnostic Flow Scheduling for Commodity Data Centers 1

Upload: vanthu

Post on 06-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

KaiChenSINGGroup,CSEDepartment,HKUSTMay16,2016@StanfordUniversity

Information-AgnosticFlowSchedulingforCommodityDataCenters

1

SINGTestbed Cluster

2

OpticalCircuitSwitch(x5)

ElectricalPacketSwitch,10G(x3)

ElectricalPacketSwitch,1G(x10)

Commodity rackservers,DellPowerEdgeR320(x100)

• NetworkStructures– OSA,thefirst-everallopticaldatacenterarchitecture(NSDI’12,TON’14)– MegaSwitch,non-blockingopticaldatacenterfabric

• Protocols/Algorithms– XPath,intra-DCroutingcontrol(NSDI’15,TON’15)– Amoeba,inter-DCdatatransfers(EuroSys’15)– PIAS,information-agnosticflowscheduling(HotNets’14,NSDI’15)– Karuna,mix-flowscheduling(SIGCOMM’16)– CODA,coflow schedulinginthedark(SIGCOMM’16)– MQ-ECN,congestioncontrolformulti-queueDC(NSDI’16)– CuttingTailLatency– Shallow-bufferManagement– ......

• Platform:DistributedComputingEngineforBigDataApps– network-enabledlargescaledistributed,parallelcomputingplatformsfor

bigdataanalyticsandmachinelearning

3

SING’sResearch:ABottom-upApproach

Goal:simple,effective,implementablenetworkedsystems!

PIAS:PracticalInformation-AgnosticFlowScheduling[USENIXNSDI’15]

• Cloudapplications– Desirelowlatencyforshortmessages

• Goal:Minimizeflowcompletiontime(FCT)– Manyflowschedulingproposals…

4

TheState-of-the-artSolutions

• PDQ[SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE[SIGCOMM’14]• …

All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements

5

TheState-of-the-artSolutions

• PDQ[SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE[SIGCOMM’14]• …

All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements

Notfeasibleforsomeapplications6

TheState-of-the-artSolutions

• PDQ[SIGCOMM’12]

• pFabric [SIGCOMM’13]

• PASE[SIGCOMM’14]• …

All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements

Hardtodeployinpractice7

Question

Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?

8

DesignGoal1

Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?

Information-agnostic: not assume a priori knowledge offlow size information available from the applications

9

DesignGoal2

Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?

FCT minimization: minimize average/tail FCTs of shortflows & not adversely affect FCTs of large flows

10

DesignGoal3

Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?

Readily-deployable: work with existing commodityswitches & be compatible with legacy network stacks

11

Question

Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?

Ouranswer:PIAS

12

PIAS’SDESIGN

13

DesignRationale

• PIASperformsMulti-LevelFeedbackQueue(MLFQ)toemulateShortestJobFirst

Priority1

Priority2

PriorityK

……

High

Low14

DesignRationale

• PIASperformsMulti-LevelFeedbackQueue(MLFQ)toemulateShortestJobFirst

Priority1

Priority2

PriorityK

……

15

SimpleExampleIllustratingPIAS

Congestion

16

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow1with10packetsandflow2with2packetsarrive

Priority3

17

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow1and2transmitsimultaneously

Priority3

18

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow2finisheswhileflow1isdemotedtopriority2

Priority3

19

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow3with2packetsarrives

Priority3

20

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow3and1transmitsimultaneously

Priority3

21

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow3finisheswhileflow1isdemotedtopriority3

Priority3

22

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow4 with2packetsarrives

Priority3

23

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Flow4 and1transmitsimultaneously

Priority3

24

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Priority3

Flow4 finisheswhileflow1isdemotedtopriority4

25

SimpleExampleIllustratingPIAS

Priority1

Priority2

Priority4

Priority3

Eventually,flow1finishesinpriority4

With MLFQ, PIAS can emulate Shortest Job First withoutprior knowledge of flow size information

26

Howtoimplement?

Priority1

Priority2

PriorityK

……

• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts

• 𝐾priorities:𝑃$ 1 ≤ 𝑖 ≤ 𝐾

• 𝐾 − 1 demotionthresholds:𝛼*(1 ≤ 𝑗 ≤ 𝐾 − 1)

• Thethresholdtodemotepriorityfrom𝑃*./ to𝑃*is𝛼*./

27

Howtoimplement?

Priority1

Priority2

PriorityK

……

• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts

1

28

Howtoimplement?

Priority1

Priority2

PriorityK

……

• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts

29

Howtoimplement?

Priority1

Priority2

PriorityK

……

• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts

2

30

• Thresholdsdependon:– Flowsizedistribution– Trafficload

• Solution:– SolveaFCTminimizationproblemtocalculatedemotionthresholds

• Problem:– Trafficishighlydynamic

DetermineThresholds

Trafficvariations->Mismatchedthresholds

31

ImpactofMismatches

HighLow

10MB

10MB

20KB

32

• Whenthethresholdisperfect(20KB)

ImpactofMismatches

33

• Whenthethresholdistoosmall(10KB)

ImpactofMismatches

Increasedlatencyforshortflows

34

• Whenthethresholdistoolarge(1MB)

ImpactofMismatches

Increasedlatencyforshortflows

LeverageECNtokeeplowbufferoccupancy

35

• Whenthethresholdistoosmall(10KB)

HandleMismatches

ECNcankeeplowlatencyIfweenableECN

36

• Whenthethresholdistoolarge(1MB)

HandleMismatches

ECNcankeeplowlatencyIfweenableECN

37

PIASin1Slide

• PIASpackettagging– Endhostsmaintainflowstatesandmarkpacketswithpriorities

• PIASprioritization– Switchesenforcestrictpriorityqueueing

• PIASratecontrol– EmployECN-basedtransport(e.g.,DCTCP)tohandlemismatch

38

TestbedExperiments

• PIASprototype– http://sing.cse.ust.hk/projects/PIAS

• TestbedSetup– AGigabitPronto-3295switch– 16Dellservers

• Benchmarks– Websearch(DCTCPpaper)– Datamining(VL2paper)– Memcached

39

SmallFlows(<100KB)

WebSearch DataMining

Compared to DCTCP, PIAS reduces averageFCT of small flows by up to 47% and 45%

47%45%

40

NS2SimulationSetup

• Topology– 144-hostleaf-spinefabricwith10G/40Glinks

• Workloads– Websearch(DCTCPpaper)– Datamining(VL2paper)

• Schemes– Information-agnostic:PIAS,DCTCPandL2DCT– Information-aware:pFabric

41

OverallPerformance

WebSearch DataMining

PIAS has an obvious advantage over DCTCP andL2DCT in both workloads.

42

SmallFlows(<100KB)

Simulations confirm testbed experiment results

WebSearch DataMining

40%- 50%improvement

43

ComparisonwithpFabric

PIAS only has 4.9% performance gap to pFabricfor small flows in data mining workload

WebSearch DataMining

44

Conclusion

• PIAS:simple,practicalandeffective– Notassumeflowinformationfromapplications

– EnforceMulti-LevelFeedbackQueuescheduling

– Usecommodityswitches&legacynetworkstacks

Information-agnostic

FCTminimization

Readilydeployable

45

Thanks!

46