1 load balanced birkhoff-von neumann switches cheng-shang chang, duan-shin lee and chi-yao yue...
TRANSCRIPT
1
Load Balanced Birkhoff-von Neumann Switches
Cheng-Shang Chang, Duan-Shin Lee and Chi-Yao Yue
presented by
Prashanth Pappu
2
High Performance Switches
• Non-blocking crossbar• Fixed time slot, fixed size cell • Parallelism, memory speed =
line rate.• Quadratic complexity but
concentrated in a single chip set.
• Centralized scheduler
Controller
IPP
IPP
IPP
OPP
OPP
OPP
. . .
. . .
3
Centralized Schedulers
• VOQs to avoid HOL blocking. • Equivalent to finding a matching on a bipartite
graph (Anderson et al)• McKeown et al. – 100% throughput with MWM.• 10Gb/s line rate implies 40 ns for scheduling.• Maximal size matching algorithms (PIM, iSLIP)• More ports and faster line rates makes it harder
to implement scheduling algorithms.
4
Overview
• New scheduling algorithm (based on Birkhoff-von Neumann decomposition) – “On service guarantees for input buffered crossbar switches: a capacity decomposition approach by Birkhoff and von Neumann”, IEEE IWQoS’99.
• Birkhoff-von Neumann switches are not practical.• Load balanced Birkhoff-von Neumann switch – “Load Balanced
Birkhoff-von Neumann Switches, Part I: One-stage Buffering”, Computer Communications. • Mis-sequencing problem and solutions – “Load Balanced Birkhoff-
von Neumann Switches, Part II: Multi-stage Buffering”, Computer Communications and I. Keslassy and N. Mckweon “Maintaining Packet Order in Two Stage Switches”, IEEE Infocom, 2002.
• Providing guaranteed rate services (The actual paper!) – “Providing guaranteed rate services in the load balanced Birkhoff-von Neumann Switches”, IEEE Infocom 2003.
• Talk presents only algorithms + results – proofs.
5
Birkhoff-von Neumann Switch
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
• Crossbar configuration is a permutation matrix, P.
• 4x4 switch, input 1-output 4, input 2-output 1 etc.
• Input rate matrix is admissible.
1),( j
jir
1),( i
jir
)),(( jirR
1
2
3
4
1 2 3 4
6
Birkhoff-von Neumann Switch• Can any admissible rate
matrix be serviced?• How do we map rate matrix to
a sequence of permutation matrices? (Change in pov as opposed to finding matching on bipartite graph)
• Express as convex combination of permutation matrices.
• Obtain the decomposition and schedule each permutation matrix proportional to its weight.
K
kkkPR
1
K
kk
1
1
KkPk ,,1,
222 NNK
7
Birkhoff-von Neumann Switch
• (von Neumann 1953) Transform the doubly substochastic rate matrix to a doubly stochastic matrix.
• (Birkhoff 1946) Decompose doubly stochastic rate matrix to weighted sum of permutation matrices.
• (PGPS) Use simple packet scheduling algorithm (WFQ) to determine which permutation matrix should be used to configure crossbar.
8
Example
• von-Neumann conversion.
• Pivots around (1,2), (2,1), (2,2) etc.
• There are other (fairer) ways to obtain this conversion.
0 0.3 0.2 0.4
0.2 0.3 0 0.2
0.4 0.1 0.3 0
0.2 0 0.2 0.3
R =
0 0.4 0.2 0.4
0.4 0.4 0 0.2
0.4 0.2 0.4 0
0.2 0 0.4 0.4
R’ =
)( 3NO
9
Example0 0.4 0.2 0.4
0.4 0.4 0 0.2
0.4 0.2 0.4 0
0.2 0 0.4 0.4
0 1 0 0
1 0 0 0
0 0 1 0
0 0 0 1
0 0 0 1
0 1 0 0
1 0 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 1 0 0
1 0 0 0
R’ = = 0.4 +0.4
+0.2
10
Not practical
• Birkhoff-von Neumann decomposition is non-trivial with O(N4.5) complexity, though required only when rates change.
• Need to know rate matrix.
• Memory : O(N2) permutation matrices.
• Does not support multicast.
• Solution – Load balanced Birkhoff-von Neumann switch.
11
Load balanced BvN switch
• We know decomposition is easy for uniform Bernouli i.i.d traffic.
• Use a first stage that load balances traffic to second stage!• First stage uses permutation matrices generated from a
one-cycle permutation matrix. (Input i connects to output (n+i) modulo N at time n.)
. . .
. . .
Load balancing stage BvN switch
12
Second stage (Switching)
• Traffic from first stage is instantly transferred to buffers at second stage.
• With balanced traffic, second stage can also use a deterministic sequence of cyclical permutation matrices. (Input j is connected to output ((n-j) modulo N) at time n.)
• Both stages are identical, complexity of scheduling algorithm O(1).
• Low hardware complexity.• 100% throughput (under a mild technical
condition)
13
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99
Ave
rage
De
lay
Output-buffered Load Balanced Birkhoff-von NeumannBirkhoff-von Neumann 4-SLIP
Uniform pareto bursty traffic, N=16
14
Load balanced BvN switch (multi-buffered)
• Problem of mis-sequencing of packets.• Packets are distributed on arrival times – no bound on a
resequencing buffer.• Use load-balancing and re-sequencing buffers.• Load-balancing based on flows and not according to arrival times.
15
FCFS with jitter control
• Flow splitter sends packets from same flow in round robin fashion to the N VOQs.
• Causes packets of same flow to be split almost evenly among inputs of second stage.
• Jitter control at second stage delays each packet to its maximum delay (targeted departure time is obtained from corresponding OQ switch)
• Flows entering second stage are time-shifted versions of original ones.
16
FCFS with jitter control
• Delay of a packet is bounded by sum of delay through the corresponding OQ switch and (N-1)Lmax + NMmax.
• Essentially delay < 2N for unicast traffic.
• Size of load balancing buffer bounded by NLmax.
• Size of re-sequencing buffer bounded by NMmax.
• Lmax (Mmax) is the maximum number of flows at an input (output).
17
Guaranteed Rate Services
• Load balanced BvN switch provides best effort service.
• How do we provide service guarantees?
• Earliest Deadline First (EDF) based scheme.
18
EDF based scheme.
• Same architecture as FCFS scheme with jitter control.• Targeted departure time is departure time of
corresponding link with capacity equal to the guaranteed rate of the flow.
• Packets served in EDF order at output buffer.
19
EDF scheme
• Every packet of a guaranteed rate flow has a delay bound – targeted departure time + (N-1)Lmax + NMmax.
• Resequencing and load balancing buffer bounded by NMmax.
20
Not discussed…
• Full Frames First – an algorithm that prevents packets from being mis-sequenced. (Will be discussed in next paper presentation – “Scaling Internet routers using optics”)
• Frame based scheme for guaranteed rate services – algorithm based on FFF for providing rate guarantees.