information-agnostic flow scheduling for commodity...
TRANSCRIPT
KaiChenSINGGroup,CSEDepartment,HKUSTMay16,2016@StanfordUniversity
Information-AgnosticFlowSchedulingforCommodityDataCenters
1
SINGTestbed Cluster
2
OpticalCircuitSwitch(x5)
ElectricalPacketSwitch,10G(x3)
ElectricalPacketSwitch,1G(x10)
Commodity rackservers,DellPowerEdgeR320(x100)
• NetworkStructures– OSA,thefirst-everallopticaldatacenterarchitecture(NSDI’12,TON’14)– MegaSwitch,non-blockingopticaldatacenterfabric
• Protocols/Algorithms– XPath,intra-DCroutingcontrol(NSDI’15,TON’15)– Amoeba,inter-DCdatatransfers(EuroSys’15)– PIAS,information-agnosticflowscheduling(HotNets’14,NSDI’15)– Karuna,mix-flowscheduling(SIGCOMM’16)– CODA,coflow schedulinginthedark(SIGCOMM’16)– MQ-ECN,congestioncontrolformulti-queueDC(NSDI’16)– CuttingTailLatency– Shallow-bufferManagement– ......
• Platform:DistributedComputingEngineforBigDataApps– network-enabledlargescaledistributed,parallelcomputingplatformsfor
bigdataanalyticsandmachinelearning
3
SING’sResearch:ABottom-upApproach
Goal:simple,effective,implementablenetworkedsystems!
PIAS:PracticalInformation-AgnosticFlowScheduling[USENIXNSDI’15]
• Cloudapplications– Desirelowlatencyforshortmessages
• Goal:Minimizeflowcompletiontime(FCT)– Manyflowschedulingproposals…
4
TheState-of-the-artSolutions
• PDQ[SIGCOMM’12]
• pFabric [SIGCOMM’13]
• PASE[SIGCOMM’14]• …
All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements
5
TheState-of-the-artSolutions
• PDQ[SIGCOMM’12]
• pFabric [SIGCOMM’13]
• PASE[SIGCOMM’14]• …
All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements
Notfeasibleforsomeapplications6
TheState-of-the-artSolutions
• PDQ[SIGCOMM’12]
• pFabric [SIGCOMM’13]
• PASE[SIGCOMM’14]• …
All assume prior knowledge of flow size information toapproximate ideal preemptive Shortest Job First (SJF)with customized network elements
Hardtodeployinpractice7
Question
Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?
8
DesignGoal1
Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?
Information-agnostic: not assume a priori knowledge offlow size information available from the applications
9
DesignGoal2
Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?
FCT minimization: minimize average/tail FCTs of shortflows & not adversely affect FCTs of large flows
10
DesignGoal3
Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?
Readily-deployable: work with existing commodityswitches & be compatible with legacy network stacks
11
Question
Without prior knowledge of flow size information, howto minimize FCT in commodity data centers?
Ouranswer:PIAS
12
DesignRationale
• PIASperformsMulti-LevelFeedbackQueue(MLFQ)toemulateShortestJobFirst
Priority1
Priority2
PriorityK
……
High
Low14
DesignRationale
• PIASperformsMulti-LevelFeedbackQueue(MLFQ)toemulateShortestJobFirst
Priority1
Priority2
PriorityK
……
15
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow1with10packetsandflow2with2packetsarrive
Priority3
17
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow1and2transmitsimultaneously
Priority3
18
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow2finisheswhileflow1isdemotedtopriority2
Priority3
19
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow3and1transmitsimultaneously
Priority3
21
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow3finisheswhileflow1isdemotedtopriority3
Priority3
22
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Flow4 and1transmitsimultaneously
Priority3
24
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Priority3
Flow4 finisheswhileflow1isdemotedtopriority4
25
SimpleExampleIllustratingPIAS
Priority1
Priority2
Priority4
Priority3
Eventually,flow1finishesinpriority4
With MLFQ, PIAS can emulate Shortest Job First withoutprior knowledge of flow size information
26
Howtoimplement?
Priority1
Priority2
PriorityK
……
• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts
• 𝐾priorities:𝑃$ 1 ≤ 𝑖 ≤ 𝐾
• 𝐾 − 1 demotionthresholds:𝛼*(1 ≤ 𝑗 ≤ 𝐾 − 1)
• Thethresholdtodemotepriorityfrom𝑃*./ to𝑃*is𝛼*./
27
Howtoimplement?
Priority1
Priority2
PriorityK
……
• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts
1
28
Howtoimplement?
Priority1
Priority2
PriorityK
……
• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts
29
Howtoimplement?
Priority1
Priority2
PriorityK
……
• Strictpriorityqueueingonswitches• Packettaggingasashimlayeratendhosts
2
30
• Thresholdsdependon:– Flowsizedistribution– Trafficload
• Solution:– SolveaFCTminimizationproblemtocalculatedemotionthresholds
• Problem:– Trafficishighlydynamic
DetermineThresholds
Trafficvariations->Mismatchedthresholds
31
• Whenthethresholdistoolarge(1MB)
ImpactofMismatches
Increasedlatencyforshortflows
LeverageECNtokeeplowbufferoccupancy
35
PIASin1Slide
• PIASpackettagging– Endhostsmaintainflowstatesandmarkpacketswithpriorities
• PIASprioritization– Switchesenforcestrictpriorityqueueing
• PIASratecontrol– EmployECN-basedtransport(e.g.,DCTCP)tohandlemismatch
38
TestbedExperiments
• PIASprototype– http://sing.cse.ust.hk/projects/PIAS
• TestbedSetup– AGigabitPronto-3295switch– 16Dellservers
• Benchmarks– Websearch(DCTCPpaper)– Datamining(VL2paper)– Memcached
39
SmallFlows(<100KB)
WebSearch DataMining
Compared to DCTCP, PIAS reduces averageFCT of small flows by up to 47% and 45%
47%45%
40
NS2SimulationSetup
• Topology– 144-hostleaf-spinefabricwith10G/40Glinks
• Workloads– Websearch(DCTCPpaper)– Datamining(VL2paper)
• Schemes– Information-agnostic:PIAS,DCTCPandL2DCT– Information-aware:pFabric
41
OverallPerformance
WebSearch DataMining
PIAS has an obvious advantage over DCTCP andL2DCT in both workloads.
42
SmallFlows(<100KB)
Simulations confirm testbed experiment results
WebSearch DataMining
40%- 50%improvement
43
ComparisonwithpFabric
PIAS only has 4.9% performance gap to pFabricfor small flows in data mining workload
WebSearch DataMining
44
Conclusion
• PIAS:simple,practicalandeffective– Notassumeflowinformationfromapplications
– EnforceMulti-LevelFeedbackQueuescheduling
– Usecommodityswitches&legacynetworkstacks
Information-agnostic
FCTminimization
Readilydeployable
45