hedera: dynamic flow scheduling for data center networks · hedera: dynamic flow scheduling for...
TRANSCRIPT
![Page 1: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/1.jpg)
Hedera: Dynamic Flow Scheduling for Data Center Networks
Mohammad Al-Fares Sivasankar RadhakrishnanBarath Raghavan* Nelson Huang Amin Vahdat
UC San Diego * Williams College
- USENIX NSDI 2010 -
![Page 2: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/2.jpg)
Motivation
• Current data center networks support tens of thousands of machines
• Limited port-densities at the core routers Horizontal expansion = increasingly relying on multipathing
!"#$%&'($)*#$%&'($)*
+++ +++ +++
+++
+++
+++
+++
+++ +++ +++
+++
+++
!""#$"%&'()
*+"$
,(#$
![Page 3: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/3.jpg)
Motivation
• MapReduce / Hadoop -style workloads have substantial BW requirements
• Shuffle phase stresses network interconnect
• Oversubscription / Bad forwarding Jobs often bottlenecked by network
Map
Reduce
Map Map
Reduce Reduce
Input Dataset
Output Dataset
MapReduce Workflow
![Page 4: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/4.jpg)
Contributions
• Integration + working implementation of:
1. Centralized Data Center routing control
2. Flow Demand Estimation
3. Efficient + Fast Scheduling Heuristics
• Enables more efficient utilization of network infrastructure
• Upto 96% of optimal bisection bandwidth, > 2X better than standard techniques
![Page 5: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/5.jpg)
Background
• Current industry standard: Equal-Cost Multi-Path (ECMP)
• Given a packet to a subnet with multiple paths, forward packet based on a hash of packet’s headers
• Originally developed as a wide-area / backbone TE tool
• Implemented in: Cisco / Juniper / HP ... etc.
Local Collision
DownstreamCollision
Core 0 Core 1 Core 2 Core 3
Agg 0
Flow AFlow BFlow CFlow D
Agg 1 Agg 2
![Page 6: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/6.jpg)
Background
• ECMP drawback: Static + Oblivious to link-utilization!
• Causes long-term local/downstream flow collisions
• On 27K-host fat-tree and a randomized matrix, ECMP wastes average of 61% of bisection bandwidth!
Local Collision
DownstreamCollision
Core 0 Core 1 Core 2 Core 3
Agg 0
Flow AFlow BFlow CFlow D
Agg 1 Agg 2
![Page 7: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/7.jpg)
Problem Statement
Problem:
Given a dynamic traffic matrix of flow demands, how do you find paths that maximize network bisection bandwidth?
Constraint:
Commodity Ethernet switches + No end-host mods
![Page 8: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/8.jpg)
Problem Statement
• Single path forwarding (no flow splitting)
• Expressed as Binary Integer Programming (BIP)
• Combinatorial, NP-complete
• Exact solvers CPLEX/GLPK impractical for realistic networks
k
∑i=1
fi(u,v)≤ c(u,v) ∑w∈V
fi(u,w) = 0 (u �= si, ti)
∀v,u : fi(u,v) =− fi(v,u)
∑w∈V
fi(si,w) = di
∑w∈V
fi(w, ti) = di
1. Capacity Constraint 2. Flow Conserva5on 3. Demand Sa5sfac5on
MULTI-COMMODITY FLOW problem:
![Page 9: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/9.jpg)
Problem Statement
• Polynomial-time algorithms known for 3-stage Clos Networks (based on bipartite edge-coloring)
• None for 5-stage Clos (3-tier fat-trees)
• Need to target arbitrary/general DC topologies!
k
∑i=1
fi(u,v)≤ c(u,v) ∑w∈V
fi(u,w) = 0 (u �= si, ti)
∀v,u : fi(u,v) =− fi(v,u)
∑w∈V
fi(si,w) = di
∑w∈V
fi(w, ti) = di
1. Capacity Constraint 2. Flow Conserva5on 3. Demand Sa5sfac5on
![Page 10: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/10.jpg)
Architecture
• Hedera : Dynamic Flow Scheduling
• Optimize achievable bisection bandwidth by assigning flows non-conflicting paths
• Uses flow demand estimation + placement heuristics to find good flow-to-core mappings
2. EstimateFlow Demands
3. ScheduleFlows
1. DetectLarge Flows
![Page 11: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/11.jpg)
Architecture
• Scheduler operates a tight control-loop:
1. Detect large flows
2. Estimate their bandwidth demands
3. Compute good paths and insert flow entries into switches
2. EstimateFlow Demands
3. ScheduleFlows
1. DetectLarge Flows
![Page 12: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/12.jpg)
Elephant Detection
![Page 13: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/13.jpg)
Elephant Detection
• Scheduler continually polls edge switches for flow byte-counts
• Flows exceeding B/s threshold are “large”
• > %10 of hosts’ link capacity in our implementation (i.e. > 100Mbps)
• What if only “small” flows ?
• Default ECMP load-balancing efficient
![Page 14: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/14.jpg)
Elephant Detection
• Hedera complements ECMP!
• Default forwarding uses ECMP
• Hedera schedules large flows that cause bisection bandwidth problems
![Page 15: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/15.jpg)
Demand Estimation
![Page 16: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/16.jpg)
Demand Estimation
• Empirical measurement of flow rates are not suitable / sufficient for flow scheduling
• Current TCP flow-rates may be constrained to inefficient forwarding
• Need to find the flows’ overall fair bandwidth allocation, to better inform placement algorithms
Motivation:
![Page 17: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/17.jpg)
Demand Estimation
• TCP’s AIMD + Fair Queueing try to achieve max-min fairness in steady state
• When routing is a degree of freedom, establishing max-min fair demands is hard
• Ideal case: find max-min fair bandwidth allocation as if constrained by host-NIC
![Page 18: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/18.jpg)
Demand Estimation
• Given traffic matrix of large flows, modify each flow’s size at Src + Dst iteratively:
1. Sender equally distributes unconverged bandwidth among outgoing flows
2. NIC-limited receivers decrease exceeded capacity equally between incoming flows
3. Repeat until all flows converge
• Guaranteed to converge in O(|F|) time
![Page 19: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/19.jpg)
Demand EstimationA
B
C
X
Y
Flow Estimate Conv. ?
A X
A Y
B Y
C Y
Sender Available Unconv. BW Flows Share
A 1 2 1/2
B 1 1 1
C 1 1 1
Senders
![Page 20: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/20.jpg)
Demand Estimation
Recv RL? Non-SLFlows Share
X No - -
Y Yes 3 1/3
Receivers
A
B
C
X
Y
Flow Estimate Conv. ?
A X 1/2
A Y 1/2
B Y 1
C Y 1
![Page 21: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/21.jpg)
Demand EstimationA
B
C
X
Y
Flow Estimate Conv. ?
A X 1/2
A Y 1/3 Yes
B Y 1/3 Yes
C Y 1/3 Yes
Sender Available Unconv. BW Flows Share
A 2/3 1 2/3
B 0 0 0
C 0 0 0
Senders
![Page 22: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/22.jpg)
Demand EstimationA
B
C
X
Y
Flow Estimate Conv. ?
A X 2/3 Yes
A Y 1/3 Yes
B Y 1/3 Yes
C Y 1/3 Yes
Recv RL? Non-SLFlows Share
X No - -
Y No - -
Receivers
![Page 23: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/23.jpg)
Placement Heuristics
![Page 24: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/24.jpg)
Global First-Fit?
Flow AFlow BFlow C
? ?
• New flow detected, linearly search all possible paths from SD
• Place flow on first path whose component links can fit that flow
0 1 2 3
Scheduler
![Page 25: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/25.jpg)
Global First-FitFlow AFlow BFlow C
• Flows placed upon detection, are not moved
• Once flow ends, entries + reservations time out
0 1 2 3
Scheduler
![Page 26: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/26.jpg)
Simulated Annealing
• Probabilistic search for good flow-to-core mappings
• Goal: Maximize achievable bisection bandwidth
• Current flow-to-core mapping generates neighbor state
• Calculate total exceeded bandwidth capacity
• Accept move to neighbor state if bisection BW gain
• Few thousand iterations for each scheduling round
• Avoid local-minima; non-zero prob. to worse state
![Page 27: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/27.jpg)
Simulated Annealing
• Implemented several optimizations that reduce the search-space significantly:
• Assign a single core switch to each destination host
• Incremental calculation of exceeded capacity
• .. among others
![Page 28: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/28.jpg)
Simulated Annealing0 1 2 3
Flow AFlow BFlow C
• Example run: 3 flows, 3 iterations
Core210
? ? ?202
?203
Scheduler
![Page 29: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/29.jpg)
Simulated Annealing0 1 3
Flow AFlow BFlow C
• Final state is published to the switches and used as the initial state for next round
Core ? ? ? ?203
2
Scheduler
![Page 30: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/30.jpg)
Fault-Tolerance
![Page 31: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/31.jpg)
Fault-Tolerance0 1 3
Flow AFlow BFlow C
• Link / Switch failure: Use PortLand’s fault notification protocol
• Hedera routes around failed components
2
Scheduler
![Page 32: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/32.jpg)
Fault-Tolerance0 1 3
Flow AFlow BFlow C
• Scheduler failure:
• Soft-state, not required for correctness (connectivity)
• Switches fall back to ECMP
2
Scheduler
![Page 33: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/33.jpg)
Implementation
![Page 34: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/34.jpg)
Implementation• 16-host testbed
• k=4 fat-tree data-plane
• 20 machines; 4-port NetFGPAs / OpenFlow
• Parallel 48-port non-blocking Quanta switch
• 1 Scheduler machine
• Dynamic traffic monitoring
• OpenFlow routing control
![Page 35: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/35.jpg)
Evaluation - Testbed
0
3
6
9
12
15
Stag1(.2,.3)
Stag2(.2,.3)
Stag1(.5,.3)
Stag2(.5,.3)
Stride(2
)
Stride(4
)
Stride(8
)
Bise
ctio
n Ba
ndw
idth
(G
bps)
Communication Pattern
ECMP Global First-Fit Simulated Annealing Non-blocking
![Page 36: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/36.jpg)
Evaluation - Testbed
0
3
6
9
12
15
Rand0Rand1
RandBij0
RandBij1RandX2
RandX3Hotsp
ot
Bise
ctio
n Ba
ndw
idth
(G
bps)
Communication Pattern
ECMP Global First-Fit Simulated Annealing Non-blocking
![Page 37: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/37.jpg)
Data ShuffleECMP GFF SA Control
Total Shuffle Time (s) 438.4 335.5 336.0 306.4
Avg. Comple5on Time (s) 358.1 258.7 262.0 226.6
Avg. Bisec5on BW (Gbps) 2.81 3.89 3.84 4.44
Avg. host goodput (MB/s) 20.9 29.0 28.6 33.1
• 16-hosts: 120 GB all-to-all in-memory shuffle
• Hedera achieves 39% better bisection BW over ECMP, 88% of ideal non-blocking switch
![Page 38: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/38.jpg)
Evaluation - Simulator
• For larger topologies:
• Models TCP’s AIMD behavior when constrained by the topology
• Stochastic flow arrival times / Bytes
• Calibrated its performance against testbed
• What about ns2 / OMNeT++ ?
• Packet-level simulators impractical at these network scales
![Page 39: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/39.jpg)
Simulator - 8,192 hosts (k=32)
0
1000
2000
3000
4000
5000
6000
7000
Stag(.5
,0)
Stag(.2
,.3)
Stride(1
)
Stride(1
6)
Stride(2
56)RandBij
Rand
Bise
ctio
n Ba
ndw
idth
(G
bps)
Communication Pattern
ECMP Global First-Fit Simulated Annealing Non-blocking
![Page 40: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/40.jpg)
Reactiveness• Demand Estimation:
• 27K hosts, 250K flows, converges < 200ms
• Simulated Annealing:
• Asymptotically dependent on # of flows + # iter:
• 50K flows and 10K iter: 11ms
• Most of final bisection BW: first few hundred iter
• Scheduler control loop:
• Polling + Estimation + SA = 145ms for 27K hosts
![Page 41: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/41.jpg)
Limitations
• Dynamic workloads, large flow turnover faster than control loop
• Scheduler will be continually chasing the traffic matrix
• Need to include penalty term for unnecessary SA flow re-assignments
Flow Size
Mat
rix
Stab
ility
Stab
leU
nsta
ble
ECMP Hedera
![Page 42: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/42.jpg)
Future Work
• Improve utility function of Simulated Annealing
• SA movement penalties (TCP)
• Add flow priorities (QoS)
• Incorporate other metrics: e.g. Power
• Release combined system: PortLand + Hedera (6/1)
• Perfect, non-centralized, per-packet Valiant Load Balancing
![Page 43: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/43.jpg)
Conclusions
• Simulated Annealing delivers significant bisection BW gains over standard ECMP
• Hedera complements ECMP
• RPC-like traffic is fine with ECMP
• If you’re running MapReduce/Hadoop jobs on your network, you stand to benefit greatly from Hedera; tiny investment!
![Page 44: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/44.jpg)
Questions?
http://cseweb.ucsd.edu/~malfares/
![Page 45: Hedera: Dynamic Flow Scheduling for Data Center Networks · Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan * Nelson](https://reader031.vdocument.in/reader031/viewer/2022011823/5ed16acc522329668b7d06f5/html5/thumbnails/45.jpg)
Traffic Overhead
• 27K host network:
• Polling: 72B / flow * 5 flows/host * 27K hosts / 0.1 sec = < 100MB/s for DC
• Could also use data-plane