Sparrow Distributed Low-Latency
Scheduling
Kay Ousterhout, Patrick Wendell, Matei
Zaharia, Ion Stoica
Sparrow schedules tasks in clusters
using a decentralized, randomized approach
support constraints and fair sharing
Scheduling Setting
…
…
Map
Reduce/Spark/Dr
yad Job
…
Tas
k
Tas
k
Tas
k
Map
Reduce/Spark/Dr
yad Job Tas
k
Tas
k
Job Latencies Rapidly Decreasing
10
min.
10 sec. 100
ms
1 ms
2004:
MapReduce
batch job
2009:
Hive
query
2010:
Dremel
Query
2012:
Impala
query 2010:
In-
memory
Spark
query
2013:
Spark
streamin
g
10
min.
10 sec. 100
ms
1 ms
2004:
MapReduce
batch job
2009:
Hive
query
2010:
Dremel
Query
2012:
Impala
query 2010:
In-
memory
Spark
query
2013:
Spark
streamin
g
1000 16-core machines
26 decisio
ns/seco
nd
Scheduler
throughput
1.6K
decisio
ns/seco
nd
160K
decisio
ns/seco
nd
16M
decisio
ns/seco
nd
Millisecond Latency
Quality Placement
Fault Tolerant
High Throughput
Today:
Completely Centralized
Less centralization
Sparrow: Completely
Decentralized
✗
✗
✓
✗ ✓
✓
✓
? ✓
Sparrow Decentralized approach Existing randomized approaches Batch Sampling Late Binding Analytical performance evaluation
Handling constraints
Fairness and policy enforcement
Within 12% of ideal on 100 machines
Scheduling with Sparrow
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
Random
Simulated Results
100-task jobs in 10,000-node cluster, exp. task durations
Omniscient: infinitely fast centralized scheduler
Per-task sampling
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
Power of Two Choices
Per-task sampling
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
Power of Two Choices
Per-Task Sampling
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
✓
✓
Task
1
Task
2
Per-ta
sk
Per-task Sampling
Worke
r Worke
r Worke
r Worke
r Worke
r
Schedu
ler
Schedu
ler
Schedu
ler Schedu
ler Job
Worke
r
Place m tasks on the least loaded of
dm slaves
Per-ta
sk ✓
✓
4
probes
(d = 2)
Batch
Queue length poor predictor of
wait time Work
er Work
er
80
ms 155
ms
530
ms
Poor performance on heterogeneous workloads
Late Binding
Worker
Worker
Worker
Worker
Worker
Scheduler
Scheduler
Scheduler
Scheduler Job
Worker
Place m tasks on the least loaded of
dm slaves
4
probes
(d = 2)
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler Job
Place m tasks on the least loaded of
dm slaves
4
probes
(d = 2)
Worker
Worker
Worker
Worker
Worker
Worker
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler Job
Place m tasks on the least loaded of
dm slaves
Worke
r
reques
ts task
Worker
Worker
Worker
Worker
Worker
Worker
Job Constraints
Scheduler
Scheduler
Scheduler
Scheduler Job
Worker
Worker
Worker
Worker
Worker
Worker
Restrict probed machines to those that
satisfy the constraint
Per-Task Constraints
Scheduler
Scheduler
Scheduler
Scheduler Job
Worker
Worker
Worker
Worker
Worker
Worker
Probe separately for each task
Technique Recap
Scheduler
Scheduler
Scheduler
Scheduler Batch sampling
+ Late binding
+
Constraint handling
Work
er Work
er Work
er Work
er Work
er
Work
er
Spark on Sparrow
Worke
r Worke
r Worke
r Worke
r Worke
r
Worke
r
Query: DAG of
Stages
Sparro
w
Sched
uler Job
Spark on Sparrow
Worke
r Worke
r Worke
r Worke
r Worke
r
Worke
r
Query: DAG of
Stages
Sparro
w
Sched
uler Job
Spark on Sparrow
Worke
r Worke
r Worke
r Worke
r Worke
r
Worke
r
Query: DAG of
Stages
Sparro
w
Sched
uler
Job
How does Sparrow compare to
Spark’s native scheduler?
100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background
TPC-H: Common benchmark for analytics
workloads
Sparrow
Spark: Distributed in-memory
analytics framework
Shark: SQL execution engine
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
Within 12% of ideal Median queuing
delay of 9ms
Fault Tolerance
Schedul
er 1
Schedul
er 2
Spark
Client 1 ✗ Spark
Client 2
Timeout: 100ms
Failover: 5ms
Re-launch queries: 15ms
Related Work
Centralized task schedulers: e.g., Quincy
Two level schedulers: e.g., YARN, Mesos
Coarse-grained cluster schedulers: e.g.,
Omega
Load balancing: single task
Batch sampling +
Late binding +
Constraint
handling
www.github.com/radlab/sparrow
Sparrows provides near-ideal job response times without global visibility
Scheduler
Scheduler
Scheduler
Scheduler
Work
er Work
er Work
er Work
er Work
er
Work
er