cluster scheduling at microsoft...
TRANSCRIPT
![Page 1: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/1.jpg)
Azure
Cluster Scheduling at Microsoft Scale
MIT, 10/31/2019
Computer Networks (6.829)
Konstantinos Karanasos
![Page 2: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/2.jpg)
GSL (CISL)
Mission: applied research lab working on systems for big data, cloud, and machine learning
Office of the CTO for Azure Data
~15 researchers/engineers in the Bay Area, Redmond, and Madison
![Page 3: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/3.jpg)
The three hats we wear
Applied research group
Collaborating with database, big data, and AI infra groups at MS
Open-sourcing our codeApache Hadoop, ONNX Runtime, MlFlow, REEF, Heron
![Page 4: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/4.jpg)
Current Focus Areas
Resource management
Systems for ML
ML for Systems
Query Optimization Provenance
![Page 5: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/5.jpg)
What is a Resource Manager?
Node Manager
Node Manager
Node Manager
1. Request
2. Allocation
3. Start container
acquire cluster resources in the form of containersDo we really need a
Resource Manager?
![Page 6: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/6.jpg)
Lessons learned: Abstracting out the RM layer
Ad-hocapp
Ad-hocapp
Ad-hocapp
Ad-hocApps
YARN
MR v2
Tez Giraph Storm DryadREEF
...
Hive / Pig
Hadoop 1.x(MapReduce)
MR v1
Hive / Pig
Users
ApplicationFrameworks
ProgrammingModel(s)
Cluster OS (Resource Management)
Hadoop 1 World Hadoop 2 World
File SystemHDFS 1 HDFS 2
Hardware
Ad-hocapp
Ad-hocapp
Scopeon
YARNSpark
monolithicReuse of RM component
YARN
layering abstractions
Heron
![Page 7: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/7.jpg)
Cosmos: Microsoft’s Analytics Stack
Hydra
![Page 8: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/8.jpg)
Cluster(s):> 50K nodes
Job(s):>2M tasks, >5PB input
Tasks:10 sec 50th %ile
Utilization:~60% avg CPU util
Scheduler:>70K QPS
The Scale and Utilization Challenge
![Page 9: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/9.jpg)
Scheduling in Analytics Clusters: a Journey…
Hadoop MR, Centralized sched.
MR
201920132003 2008
Scope, Centralized sched.
Scope
[eurosys07, vldb08]
HydraMR
YARN
Tez Spark
+multi-framework+security+scheduler expressivity[socc13]
Scope1 Scope2 Scope3
Distributed sched. +tooling/optimizer,+scale, +high utilization[osdi14]
![Page 10: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/10.jpg)
Teaser…
>99% tenants migrated>250K servers>500k daily jobs>1 Zetabyte data processed>1 Trillion tasks scheduled
![Page 11: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/11.jpg)
Agenda
Overview of legacy systems (Apollo, YARN)
Scale: Federated YARN
Resource utilization: Opportunistic Containers
Production Experience
![Page 12: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/12.jpg)
Distributed Scheduling:Legacy Cosmos (Apollo)
![Page 13: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/13.jpg)
Legacy Cosmos (Apollo)
JM2JM1 JM3
Distributed scheduling got us scale
Execution plan
Cosmos Job Service
Compile/Optimize
T = SELECT * FROM blah;
SELECT T.blah FROM T;
AdmissionControl
(gang-queue)
Distributed scheduling+ node-level queuing
* JM: Job Manager
Script
JM2
* JM: Job Manager
Script
![Page 14: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/14.jpg)
Legacy CosmosProblem: (required) static resource allocation yields low utilization
resource fragmentation within and across queues
Solution: opportunistically share resources
“guaranteed” containers
“opportunistic”containers
opportunistic scheduling got us utilization
![Page 15: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/15.jpg)
Legacy Cosmos: Pros/Cons
+ Great scalability
+ Good resource utilization
- No support for multiple frameworks
- Limited control on scheduling (e.g., fairness, load-balancing, locality, special HW)
![Page 16: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/16.jpg)
Centralized Scheduling:Apache Hadoop YARN
![Page 17: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/17.jpg)
Legacy YARN
AM
RM
NM
1) Submit job
2) Admission control (fairness, quotas, constraints, SLOs, …)
3) Schedule Application Master (AM)
4) Start AM container
NMNM
![Page 18: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/18.jpg)
Legacy YARN
AM
RM
NM NM
Task
4) AM requests on heartbeat for more containers
6) Start container
7) AM-Task communication
5) RM grants “token”
![Page 19: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/19.jpg)
Legacy YARN: Pros/Cons
+ Support for arbitrary frameworks
+ Rich scheduling invariants
+ Non-tech advantage: OSS/mind-share
- Scale limits (~5K nodes in 2013)
- Utilization: heartbeats lead to idle resources
![Page 20: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/20.jpg)
Main challenges
High resource utilization Scalability
Rich placement constraints
Production jobs and predictability
Medea [eurosys18]
Mercury [atc15,eurosys16] Federation [nsdi19]
Morpheus [osdi16]
OSS + production!
OSS
![Page 21: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/21.jpg)
Improving YARN’s scalability:Federated Architecture
![Page 22: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/22.jpg)
Need scale and utilization?Go distributed!Need scheduling control and multi-framework?Go centralized!
Want it all?
![Page 23: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/23.jpg)
RM
sub-cluster 1
RM
sub-cluster 2
Hydra Architecture
Rout
erRo
uter
Rout
erRo
uter
StateStoreProxy
StateStoreProxy
StateStoreProxy
StateStore
![Page 24: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/24.jpg)
NM
Hydra Architecture
RM
sub-cluster 1
RM
sub-cluster 2
Rout
erRo
uter
Rout
erRo
uter
StateStoreProxy
StateStoreProxy
StateStoreProxy
StateStore
![Page 25: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/25.jpg)
Hydra Architecture
AMRMProxy
AM
RM
sub-cluster 1
RM
sub-cluster 2
Rout
erRo
uter
Rout
erRo
uter
StateStoreProxy
StateStoreProxy
StateStoreProxy
StateStore
Task
NM NM
![Page 26: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/26.jpg)
Hydra Architecture
AMRMProxy
AM
RM
sub-cluster 1
RM
sub-cluster 2
Rout
erRo
uter
Rout
erRo
uter
StateStoreProxy
StateStoreProxy
StateStoreProxy
StateStore
GlobalPolicy
Generator
NM
![Page 27: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/27.jpg)
Scheduling Desiderata: Global Goals
High utilization
Scheduling invariants (e.g., fairness)
Locality (e.g., machine preferences)
![Page 28: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/28.jpg)
QA QB
root
50% 50%
demand: 12demand: 4
demand: 12demand: 12
✓ Locality, Fairness
✗ Utilization
✓ Utilization, Fairness
✗ Locality
✓ Utilization, locality
✗ Fairness
✓ Utilization✓ Fairness✓ Locality
PoliciesAMRMProxy routing of requests
Enforce locality?Per-cluster RM scheduling decisions
Enforce quotas?
![Page 29: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/29.jpg)
Key Idea
Decouple:
Share determinationHow many resources should a queue get?
PlacementOn which machines should each task run?
![Page 30: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/30.jpg)
RM
sub-cluster 1
RM
sub-cluster 2
StateStoreProxy
StateStoreProxy
StateStoreProxy
StateStore
GlobalPolicy
Generator•
•
•
1
2
3
1 1
2
3
* More advanced than what in prod. (details in paper)
Proposed Solution*
![Page 31: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/31.jpg)
Handling GPG downtime
If GPG is down, we would fallback to local decisionsProblematic if they “diverge” too much from global one
Leverage LP-based “tuning” of local queue allocationHistorical demand as a predictor of future demand
![Page 32: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/32.jpg)
Improving YARN’s Utilization:Opportunistic Containers
![Page 33: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/33.jpg)
Suboptimal Utilization in Scope-on-YARN
NM1 NM2
RMj1j2
Due to:Gang schedulingFeedback delays
5 sec 10 sec 50 sec Mixed-5-50
60.59% 78.35% 92.38% 78.54%
![Page 34: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/34.jpg)
Opportunistic Containers in YARN
AM
RM
NM NM
Task
Mask feedback delays
Only OPP containers can be queued
Promotion/demotion of containers
![Page 35: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/35.jpg)
Utilization Gains with Node-Side Queuing
But are long queues all we need?
![Page 36: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/36.jpg)
Job Completion Times with Node-Side Queuing
Naïve node-side queuing can be detrimental for job completion times
Proper queue management techniques are required
![Page 37: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/37.jpg)
Problems with Node-Side Queuing
Load imbalance across nodesSuboptimal task placement
Head-of-line blockingEspecially for heterogeneous tasks
Early binding of tasks to nodes N1 N2 N3
![Page 38: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/38.jpg)
Queue management techniques
Place tasks to node queues
Prioritize task execution
(queue reordering)
Bound queue lengths
![Page 39: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/39.jpg)
Placement of Tasks to Queues
Placement based on queue lengthAgnostic of task characteristicsSuboptimal placement for heterogeneous workloads
Placement based on queue wait timeBetter for heterogeneous workloadsRequires task duration estimates
24 100
253
N1 N2 N3
RM
![Page 40: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/40.jpg)
Task Prioritization
Queue reordering strategiesShortest Remaining Job First (SRJF)Least Remaining Tasks First (LRTF)Shortest Task First (STF)Earliest Job First (EJF)
SRJF and LRTF are job-awareDynamically reorder tasks based on job progress
Starvation freedomGive priority to tasks waiting more than X secs
N1 N2 N3
RMj2: 5 tasksj3: 9 tasksj1: 21 tasks
![Page 41: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/41.jpg)
Bounding Queue Lengths
Determine max number of tasks at a queueTrade-off between short and long queues
Short queuesResource idling à lower throughput
Long queuesHigh queuing delays, early binding of tasks to queues à longer job completion times
Static and dynamic queue bounding
![Page 42: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/42.jpg)
Queuing Techniques: Evaluation
1.7x improvement in median JCT over YARN
Both bounding and reordering are crucial
![Page 43: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/43.jpg)
Production Experience
![Page 44: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/44.jpg)
Workload
![Page 45: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/45.jpg)
Scale High scheduling rate at low allocation latency
![Page 46: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/46.jpg)
UtilizationFederated design improves load balancing, while retaining utilization
![Page 47: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/47.jpg)
Task Performance (comparison)
YARN
![Page 48: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/48.jpg)
Performance
Jobs perform just as well (and tasks are as efficient)
![Page 49: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/49.jpg)
Qualitative Experience
operability
![Page 50: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/50.jpg)
Recap
Overview of legacy systems (Apollo, YARN)
Scale: Federated YARN
Resource utilization: Opportunistic Containers
Production Experience
![Page 51: Cluster Scheduling at Microsoft Scaleweb.mit.edu/6.829/www/currentsemester/materials/2019-10-31_RM-MIT_share.pdfOct 31, 2019 · Scheduling in Analytics Clusters: a Journey… Hadoop](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec99ca37bca3b712d021e20/html5/thumbnails/51.jpg)
Conclusion
Research in Industry is a lot of fun
Fewer bigger projects, with massive winsBig force multipliers
Failure is expected (otherwise we are not trying hard enough)
Modus Operandi
Be picky in choosing problemsEngage early, engage deep
Be inclusive with prod/OSS counterparts