recent advances and what’s next? - mosharaf chowdhury · recent advances and what’s next?...
TRANSCRIPT
![Page 1: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/1.jpg)
Recent Advances and What’s Next?Coflow
Mosharaf Chowdhury
University of Michigan
![Page 2: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/2.jpg)
Datacenter-Scale Computing
Geo-DistributedComputing
Fast AnalyticsOver the WAN
Rack-ScaleComputing
Proactive AnalyticsBefore You Think!
Coflow Networking Open Source
Apache Spark Open Source
Cluster File System Facebook
Resource Allocation Microsoft
DAG Scheduling Apache YARN
Cluster Caching Alluxio
![Page 3: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/3.jpg)
Datacenter-Scale Computing
Geo-DistributedComputing
Rack-ScaleComputing
< 0.01 ms ~ 1 ms > 100 ms
![Page 4: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/4.jpg)
Big Data
The volume of data businesses want to make sense of is increasing
Increasing variety of sources• Web, mobile, wearables, vehicles, scientific, …
Cheaper disks, SSDs, and memory
Stalling processor speeds
![Page 5: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/5.jpg)
Big Datacenters for Massive Parallelism
2005 2010 2015
MapReduce Hadoop
Spark
HiveDryad
DryadLINQ
Spark-Streaming
GraphXGraphLabPregel
Storm
Dremel
BlinkDB
1. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI’2012.
![Page 6: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/6.jpg)
Distributed Data-Parallel Applications
Multi-stage dataflow• Computation interleaved with communication
Computation Stage (e.g., Map, Reduce)• Distributed across many machines• Tasks run in parallel
Communication Stage (e.g., Shuffle)• Between successive computation stages Map Stage
Reduce Stage
A communication stage cannot complete until all the data have been transferred
![Page 7: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/7.jpg)
Communication is Crucial
Performance
As SSD-based and in-memory systems proliferate,the network is likely to become the primary bottleneck
1. Based on a month-long trace with 320,000 jobs and 150 Million tasks, collected from a 3000-machine Facebook production MapReduce cluster.
Facebook jobs spend ~25% of runtime on average in intermediate comm.1
![Page 8: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/8.jpg)
FasterCommunication
Stages:TraditionalNetworking
Approach
FlowTransfers data from a source to a destination
Independent unit of allocation, sharing, load balancing, and/orprioritization
![Page 9: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/9.jpg)
Existing Solutions
GPS RED
WFQ CSFQ
ECN XCP D2TCPDCTCP
PDQD3
FCP
DeTail pFabric
2005 2010 20151980s 1990s 2000s
RCP
Per-Flow Fairness Flow Completion Time
Independent flows cannot capture the collective communication behavior common in data-parallel applications
![Page 10: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/10.jpg)
DatacenterFabric
1
2
3
1
2
3
Why Do They Fall Short?r1 r2
s1 s2 s3
r1 r2
s1 s2 s3
Input Links Output Links
![Page 11: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/11.jpg)
Why Do They Fall Short?r1 r2
s1 s2 s3
r1 r2
s1 s2 s3Datacenter
Fabric
1
2
3
1
2
3
r1
r2
s1
s2
s3
![Page 12: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/12.jpg)
Why Do They Fall Short?
DatacenterFabric
time2 4 6
Link to r2
Link to r1
Per-Flow Fair SharingShuffle
CompletionTime = 5
Avg. FlowCompletionTime = 3.66
33
5
33
5
s1
s2
s3
r1
r2
1
2
3
1
2
3
Solutions focusing on flow completion time cannot further
decrease the shuffle completion time
![Page 13: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/13.jpg)
Improve Application-Level Performance1
DatacenterFabric
time2 4 6
Link to r2
Link to r1
Per-Flow Fair SharingShuffle
CompletionTime = 5
Avg. FlowCompletionTime = 3.66
33
5
33
5
s1
s2
s3
r1
r2
1
2
3
1
2
3
1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011.
Slow down faster flows to accelerate
slower flows
time2 4 6
Link to r2
Link to r1
Per-Flow Fair SharingShuffle
CompletionTime = 4
Avg. FlowCompletionTime = 4
444
444
Data-Proportional Allocation
![Page 14: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/14.jpg)
Communication abstraction for data-parallel applications to express their performance goalsCoflow
1. Size of each flow;2. Total number of flows;3. Endpoints of individual flows;4. Dependencies between coflows;
![Page 15: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/15.jpg)
Aggregation
Broadcast
ShuffleParallel Flows
All-to-All
Single Flow
![Page 16: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/16.jpg)
How to schedule coflows online …
… for faster#1 completion
of coflows?
… to meet#2 more
deadlines?
… for fair#3 allocation of
the network?
1
2
N
1
2
N
.
.
.
.
.
.
Datacenter
![Page 17: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/17.jpg)
Varys, Aalo & HUG
1. Coflow Scheduler Faster, application-aware data transfers throughout the network
2. Global Coordination Consistent calculation and enforcement of scheduler decisions
3. The Coflow API Decouples network optimizations from applications, relieving developers and end users
1. Efficient Coflow Scheduling with Varys, SIGCOMM’2014.2. Efficient Coflow Scheduling Without Prior Knowledge, SIGCOMM’2015.3. HUG: Multi-Resource Fairness for Correlated and Elastic Demands, NSDI’2016.
1 2 3
![Page 18: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/18.jpg)
Benefits of
time2 4 6 time2 4 6 time2 4 6
Coflow1 comp. time = 5Coflow2 comp. time = 6
Coflow1 comp. time = 5Coflow2 comp. time = 6
Fair Sharing Smallest-Flow First1,2 The Optimal
Coflow1 comp. time = 3Coflow2 comp. time = 6
L1
L2
L1
L2
L1
L2
1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012.2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013.
Link 1
Link 2
3 Units
Coflow 1
6 Units
Coflow 2
2 Units
Inter-Coflow Scheduling
![Page 19: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/19.jpg)
Inter-Coflow Scheduling
1
2
3
1
2
3
Input Links Output Links
Datacenter
Concurrent Open Shop Scheduling with Coupled Resources• Examples include job scheduling and
caching blocks• Solutions use a ordering heuristic• Consider matching constraints
Link 1
Link 2
3 Units
Coflow 1
6 Units
Coflow 2
2 Units
3
6
2
is NP-Hard
![Page 20: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/20.jpg)
Many Problems to Solve
Aalo
VarysClairvoyant Objective
HUG
Min CCT
Min CCT
Fair CCT
Yes
No
No
Optimal
Yes
No
No
![Page 21: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/21.jpg)
Coflow-Based Architecture
Centralized master-slave architecture • Applications use a client library to
communicate with the masterActual timing and rates are determined by the coflow scheduler
Master/Coordinator
Network Interface
f Computation tasks
Local Daemon
Local Daemon
Local Daemon
CoordinationCoflow Scheduler
![Page 22: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/22.jpg)
1. CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark, SIGCOMM’2016.
Coflow API
Change the applications• At the very least, we need to know
what a coflow is• For clairvoyant versions, we need
more informationChanging the framework can enabled ALL jobs to take advantage of coflows
DO NOT change the applications1
• Infer coflows from traffic network traffic patterns• Design robust coflow scheduler that
can tolerate misestimationsOur current solution only works for coflows without dependencies; we need DAG support!
![Page 23: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/23.jpg)
Performance Benefits of Using Coflows
1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’20112. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’20123. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’20134. Decentralized Task-Aware Scheduling for Data Center Networks, SIGCOMM’2014
1.003.21
5.65 5.53
22.07
1.10
0
5
10
15
20
25
Varys Fair FIFO Priority FIFO-LM NC
Ove
rhea
dO
ver V
arys
Varys Aalo1 4Per-FlowFairness
Per-FlowPrioritization
2,3
Lower is Better
![Page 24: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/24.jpg)
The Need for Coordination
8
17
115
495 99
2
1
10
100
1000
100
1000
1000
0
5000
0
1000
00Ave
rage
Coo
rdin
atio
n T
ime
(ms)
# (Emulated) Aalo Slaves
Coordination is necessary to determine realtime
• Coflow size (sum);• Coflow rates (max);• Partial order of coflows (ordering);
Can be a large source of overhead• Does not impact too much for large
coflows in slow networks, but …How to perform decentralized coflow scheduling?
![Page 25: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/25.jpg)
Coflow-Aware Load Balancing
Especially useful in asymmetric topologies• For example, in the presence of switch or link failures
Provides an additional degree of freedom• During path selection• For dynamically determining load balancing granularity
Increased need for coordination, but at an even higher cost
![Page 26: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/26.jpg)
Coflow-Aware Routing
Relevant in topologies w/o full bisection bandwidth• When topologies have temporary in-network oversubscriptions• In geo-distributed analytics
Scheduling-only solutions do not work well• Calls for routing-scheduling joint solutions• Must take network utilization into account• Must avoid frequent path changes
Increased need for coordination
![Page 27: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/27.jpg)
Coflows in Circuit-Switched Networks
Circuit switching is relevant again due to the rise of optical networks• Provides very high bandwidth• Expensive to setup new circuits
Co-scheduling applications and coflows• Schedule tasks so that we can reuse already-setup circuits• Perform in-network aggregation using existing circuits instead of waiting for new
circuits to be created
![Page 28: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/28.jpg)
Extension to Multiple Resources1
A DAG of coflows is very similar to a job DAG of stages
• Same principle applies, but with new challenges
Consider both fungible (b/w) and non-fungible resources (cores)
• Across the entire DAG
1. Altruistic Scheduling in Multi-Resource Clusters, OSDI2016.
![Page 29: Recent Advances and What’s Next? - Mosharaf Chowdhury · Recent Advances and What’s Next? Coflow ... Distributed Data-Parallel Applications ... 1.Coflow Scheduler Faster,application-aware](https://reader030.vdocument.in/reader030/viewer/2022021420/5ad17a3a7f8b9abd6c8bbe2e/html5/thumbnails/29.jpg)
Communication abstraction for data-parallel applications to express their performance goalsCoflow
Key open challenges1. Better theoretical understanding2. Efficient solutions to deal with decentralization, topologies, multi-resource
settings, estimations over DAG, circuit-switching, etc.More information
1. Papers: http://www.mosharaf.com/publications/2. Software/simulator/workloads: https://github.com/coflow