tarcil: reconciling scheduling speed and quality in large...
TRANSCRIPT
![Page 1: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/1.jpg)
Christina Delimitrou1, Daniel Sanchez2 and Christos Kozyrakis1
1Stanford University, 2MIT
SOCC – August 27th 2015
Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters
![Page 2: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/2.jpg)
2
¨ Goals of cluster scheduling
¤ High decision quality ¤ High scheduling speed
¨ Problem: Disparity in scheduling designs ¤ Centralized schedulers à High quality, low speed ¤ Sampling-based schedulers à High speed, low quality
¨ Tarcil: Key scheduling techniques to bridge the gap ¤ Account for resource preferences à High decision quality
¤ Analytical framework for sampling à Predictable performance ¤ Admission control àHigh quality & speed
¤ Distributed design à High scheduling speed
Executive Summary
High performance High cluster utilization
![Page 3: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/3.jpg)
3
Motivation
¨ Optimize scheduling speed (sampling-based, distributed)
¨ Optimize scheduling quality (centralized, greedy)
Good: Short jobs
Good: Long jobs
Bad: Long jobs
Bad: Short jobs
Short: 100msec, Medium: 1-10sec, Long: 10sec-10min
![Page 4: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/4.jpg)
4
Motivation
¨ Optimize scheduling speed (sampling-based, distributed)
¨ Optimize scheduling quality (centralized, greedy)
Good: Short jobs
Good: Long jobs
Bad: Long jobs
Bad: Short jobs
Short: 100msec, Medium: 1-10sec, Long: 10sec-10min
![Page 5: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/5.jpg)
5
Key Scheduling Techniques at Scale
![Page 6: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/6.jpg)
6
1. Determine Resource Preferences
¨ Scheduling quality depends on: interference, heterogeneity, scale up/out, … ¤ Exhaustive exploration à infeasible ¤ Practical data mining framework1
¤ Measure impact of a couple of allocations à estimate for large space
1C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In ASPLOS 2014.
![Page 7: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/7.jpg)
7
Example: Quantifying Interference
¨ Interference: set of microbenchmarks of tunable intensity (iBench)
1C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In ASPLOS 2014.
Measure tolerated & generated interference
QoS
68%
…
Resource Quality Q
QoS
7% …
Data mining: Recover missing resources
![Page 8: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/8.jpg)
8
2. Analytical Sampling Framework
¨ Sample w.r.t. required resource quality
![Page 9: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/9.jpg)
9
2. Analytical Sampling Framework
¨ Fine-grain allocations: partition servers in Resource Units (RU) à minimum allocation unit
RU
Single-threaded apps Reclaim unused resources
![Page 10: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/10.jpg)
10
2. Analytical Sampling Framework
¨ Match a new job with required quality Q to appropriate RUs
QR1
QR2 QR3 QR4 QR5 QR6 QR7 QR8 QR9
QR10
QR20
QR30
QR42
QR54
QR11
QR21
QR31
QR43
QR55
QR61
QR67
QR60
QR66
QR74
![Page 11: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/11.jpg)
11
2. Analytical Sampling Framework
¨ Rank resources by quality
![Page 12: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/12.jpg)
12
2. Analytical Sampling Framework
¨ Break ties with a fair coin à uniform distribution
CD
F
Resource Quality Q
![Page 13: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/13.jpg)
13
2. Analytical Sampling Framework
¨ Break ties with a fair coin à uniform distribution
CD
F
Resource Quality Q
Better resources
Worse resources
![Page 14: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/14.jpg)
14
2. Analytical Sampling Framework
¨ Sample on uniform distribution à guarantees on resource allocation quality
Pr[Q≤x] = xR
Pr[Q<0.8]=10-3
![Page 15: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/15.jpg)
15
Validation
¨ 100 server EC2 cluster ¨ Short Spark tasks ¨ Deviation between analytical and empirical is minimal
![Page 16: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/16.jpg)
16
Sampling at High Load
¨ Performance degrades (with small sample size) ¨ Or sample size needs to increase
![Page 17: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/17.jpg)
17
3. Admission Control
¨ Queue jobs based on required resource quality ¨ Resource quality vs. waiting time à set max waiting time limit
…
Q
![Page 18: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/18.jpg)
18
Tarcil Implementation
¨ 4,000 loc in C/C++ and Python
¨ Supports apps in various frameworks (Hadoop, Spark, key-value stores)
¨ Distributed design: Concurrent scheduling agents (sim. Omega2) ¤ Each agent has local copy of state, one resilient master copy
¤ Lock-free optimistic concurrency for conflict resolution (rare) à Abort and retry
¤ 30:1 worker to scheduling agent ratio
2M. Schwarzkopf, A. Konwinski, et al. Omega: flexible, scalable schedulers for large compute clusters. In EuroSys 2013.
![Page 19: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/19.jpg)
19
Evaluation Methodology
1. TPC-H Workload ¤ ~40k queries of different types ¤ Compare with a centralized scheduler (Quasar) and a distributed
scheduler based on random sampling (Sparrow) ¤ 110-server EC2 cluster (100 workers, 10 scheduling agents)
n Homogeneous cluster, no interference n Homogeneous cluster, with interference n Heterogeneous cluster, with interference
¨ Metrics: ¤ Task performance ¤ Performance predictability ¤ Scheduling latency
![Page 20: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/20.jpg)
20
Evaluation
Centralized: high overheads
Sparrow and Tarcil: similar
![Page 21: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/21.jpg)
21
Evaluation
Centralized: high overheads
Sparrow and Tarcil: similar
Centralized and Sparrow: comparable performance Tarcil: 24% lower completion time
![Page 22: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/22.jpg)
22
Evaluation
Centralized: high overheads
Sparrow and Tarcil: similar
Centralized and Sparrow: comparable performance Tarcil: 24% lower completion time
Centralized outperforms Sparrow Tarcil: 41% lower completion time & less jitter
![Page 23: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/23.jpg)
23
Scheduling Overheads
Heterogeneous, with interference
¨ Centralized: Two orders of magnitude slower than the distributed, sampling-based schedulers
¨ Sparrow and Tarcil: Comparable scheduling overheads
![Page 24: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/24.jpg)
24
Resident Load
¨ Tarcil and Centralized account for cross-job interference à preserve memcached’s QoS
¨ Sparrow causes QoS violations for memcached
memcached
![Page 25: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/25.jpg)
25
Motivation Revisited
Distributed, sampling-based
Centralized, greedy
Tarcil
Short: 100msec Medium: 1-10sec Long:10sec-10min
![Page 26: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/26.jpg)
26
More details in the paper…
¨ Sensitivity on parameters such as: ¤ Cluster load ¤ Number of scheduling agents ¤ Sample size ¤ Task duration, etc.
¨ Job priorities
¨ Large allocations
¨ Generic application scenario (batch and latency-critical) on 200 EC2 servers
![Page 27: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/27.jpg)
27
Conclusions
¨ Tarcil: Reconciles high quality and high speed scheduling ¤ Account for resource preferences ¤ Analytical sampling framework to improve predictability ¤ Admission control to maintain high scheduling quality at high load ¤ Distributed design to improve scheduling speed
¨ Results: ¤ 41% better performance than random sampling-based schedulers ¤ 100x better scheduling latency than centralized schedulers ¤ Predictable allocation quality & performance
![Page 28: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/28.jpg)
28
Questions?
¨ Tarcil: Reconciles high quality and high speed schedulers ¤ Account for resource preferences ¤ Analytical sampling framework to improve predictability ¤ Admission control to maintain high scheduling quality at high load ¤ Distributed design to improve scheduling speed
¨ Results: ¤ 41% better performance than random sampling-based schedulers ¤ 100x better scheduling latency than centralized schedulers ¤ Predictable allocation quality & performance
![Page 29: Tarcil: Reconciling Scheduling Speed and Quality in Large ...csl.stanford.edu/~christos/publications/2015.tarcil.socc.slides.pdf · 18 Tarcil Implementation ! 4,000 loc in C/C++ and](https://reader034.vdocument.in/reader034/viewer/2022042403/5f1764cc6e7d4d56a6163cb4/html5/thumbnails/29.jpg)
29
Thank you
Questions??