packet-mode emulation of output-queued switches
DESCRIPTION
Packet-Mode Emulation of Output-Queued Switches. David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE). Outline. Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff Emulation with S 4 Emulation with S 2 - PowerPoint PPT PresentationTRANSCRIPT
Packet-Mode Emulation of Output-Queued Switches
David Hay, CS, Technion
Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)
Outline
Cell-Mode Scheduling vs. Packet-Mode Scheduling
Impossibility of an Exact Emulation Speedup-RQD Tradeoff
Emulation with S4 Emulation with S2
Emulation of OQ switch w/ bounded buffer Simulation Results
CIOQ Switches
Cell-Mode Scheduling
Cell-Mode Scheduling
Cell-Mode Scheduling
Trend towards Packet-Mode
Cell-mode scheduling is getting too hard Fragmentation and reassembly should work very fast,
at the external rate Extra header for each cell loss of bandwidth
For optical switches such fragmentation and reassembly are prohibitive
Cell-mode schedulers are packet-oblivious Degradation of the overall performance
Packet-Mode Scheduling
Packet-Mode Scheduling
No need for fragmentation and reassembly Must ensure contiguous packet delivery over the
fabric While input i delivers a packet to output j, neither input
i nor output j can handle other packets.
Can packet-mode schedulers provide similar
performance guarantees as cell-mode schedulers?
[Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006]
Output Queuing Emulation
OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…
Emulation: Same input traffic same output traffic
How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Output Queuing Emulation
OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…
Emulation: Same input traffic same output traffic
How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Easy with speedup S=N N scheduling decisions every time-slot:
In the 1st decision forward the cell of input 1 In the 2nd decision forward the cell of input 2⋮ In the Nth decision forward the cell of input N
Cell-Mode Emulation is Possible
Easy with speedup S=N N scheduling decisions every time-slot:
In the 1st decision forward the cell of input 1 In the 2nd decision forward the cell of input 2⋮ In the Nth decision forward the cell of input N
Cell-Mode Emulation is Possible
1st Key Concept: Slackness of a cell (in the input side)L(C) = OC(C) - IT(C)
Slackness may decrease by at most 2 in every time-slot A cell leaves the destination of C OC-- A cell arrives at the input and is queued before C IT++
Initial slackness can be made non-negative When C arrive, Insert it in the OC(C)th place of its input buffer.
Plan: Ensure that slackness always increases by 2 Slackness is never negative All cells are delivered on time
Cell-Mode Emulation w/ S=2[Chuang et al.,1999]
Input Thread: (“bad guys”)How many cells proceed C in its input-port buffer?
Output Cushion: (“good guys”)How many cells are queued in the output-buffer of C’s destination, and should leave the OQ switch before C
Stable Marriage (stable matching): Given two equal-size sets M,W and preference lists from every mM, wW. Find a matching in which there are no two pairs (m,w),(m’,w’) s.t. m prefer w’ over w w’ prefer m over m
Classical problem in CS Stable marriage always exists Many algorithms..
Cell-Mode Emulation w/ S=2[Chuang et al.,1999]
Critical Cell First (CCF) algorithm performs stable marriage at each decision:M is the set of inputs, W is the set of outputs i prefers o1 over o2 if there is a cell for o1 that
is queued before all cells for o2
o prefers i1 over i2 if there is a cell from i1 that should leave before all cells from i2
Cell-Mode Emulation w/ S=2[Chuang et al.,1999]
For each cell C from input-port i to output port j, and each scheduling decision:C is forwarded (and we don’t care about it)C’ was forwarded from i, and i preferred to
forward it IT--C’ was forwarded to j, and j preferred to
receive it OC++ Two scheduling decisions every time-slots
Slackness always increases by 2
Cell-Mode Emulation w/ S=2[Chuang et al.,1999]
Easy with speedup S=N Possible with speedup S=2 (w/ CCF)
Lower bound: S≥2-1/N is required [Chuang et
al.,1999]
Cell-Mode Emulation
What is the speedup required for
packet-mode emulation?
Outline
Cell-Mode Scheduling vs. Packet-Mode Scheduling
Impossibility of an Exact Emulation Speedup-RQD Tradeoff
Emulation with S4 Emulation with S2
Emulation of OQ switch w/ bounded buffer Simulation Results
Packet-Mode Emulation is Impossible
Regardless of speedupEven with speedup S=N
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Packet-Mode Emulation is Impossible
Outline
Cell-Mode Scheduling vs. Packet-Mode Scheduling
Impossibility of an Exact Emulation Speedup-RQD Tradeoff
Emulation with S4 Emulation with S2
Emulation of OQ switch w/ bounded buffer Simulation Results
Emulation w/ Relative Queuing Delay
The CIOQ switch is allowed a bounded lag behind the shadow OQ switch
Exact same behavior as the optimal OQ switch, but with some extra delay Called relative queuing delay
Can we provide packet-mode OQ emulation with bounded RQD and small speedup?
Our Results: Speedup-RQD tradeoff
Speedup
RQD
2
4
2Lmax
Lower bound on RQD (even with infinite speedup)
Lower bound on the speedup (from cell-mode scheduling)
Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ Lmax as one huge cell
Lmax= maximum packet size (known value)
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
PIFO Cell-Mode OQ Switch
FIFO = First-In First-Out
PIFO Cell-Mode OQ Switch
FIFO = First-In First-Out PIFO = Push-In First-Out
PIFO Cell-Mode OQ Switch
FIFO = First-In First-Out PIFO = Push-In First-Out
FIFO Packet-Mode OQ Switch is a PIFO Cell-Mode Switch
Underlying CCF Algorithm
Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch [Chuang et al.,1999]
But, CCF does not maintain contiguous packet forwarding over the fabric!
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
PIFO Cell-Mode OQ
=
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous Decomposition
Frame-Based Schedulers
Works in pipelined frame-based manner
Within each frame: Build a demand matrix for this frame Schedule the demand matrix of the
previous frame
time
At each frame of size T, CCF forwards at most 2T cells from each input and to each output.
Building the Demand Matrix
3012
1221
2220
0213
Number of cells CCF sent from input 1 to output 1 in
the last frame
+ + +
+
+
+
+
+
+ +
+
+
≤ 2T
≤ 2T
≤ 2T
≤ 2T
++++
++++
++++≤≤ ≤ ≤
Problem: A packet may span several frames.
2T 2T 2T 2T
Building the Demand Matrix
Count only packets whose last cell is forwarded by the CCF in the frame
Each row/column in the matrix is bounded by 2T+N(Lmax-1)For each input-output pair only cells of one
additional packet can be added.
Translates into RQD of 2T+(Lmax-2).
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous Decomposition
Decomposing the Demand Matrix Challenge: Decompose the matrix into permutations
while maintaining contiguous packet delivery. Each permutation dictates a scheduling decision.
First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations.
0010
0100
1000
0001
1000
0010
0100
0001
1000
0100
0010
0001
3012
1221
2220
0213
0001
0010
1000
0100
0001
1000
0100
0010
1000
0001
0010
0100
Contiguous Greedy Decomposition
To maintain contiguous packet delivery: If (i,j) was matched in iteration t-1 and there are more
(i,j) cells to schedule keep for iteration t.
Find a greedy matching for the rest of the matrix.
Speedup: RQD: 2T+Lmax-1T
LN )( max 14
Our Results: Speedup-RQD tradeoff
Speedup
RQD
2
4
2Lmax
S=4+ (N(Lmax-1))/TRQD = 2T+Lmax-1
Next…
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous Decomposition
Emulation w/ S2 - Framing
Keep a separate demand matrix for every possible packet size
Example: Possible packets sizes are 3,4,6
11040
86110
15157
0231
# of size 3 packets
# of size 4 packets
# of size 6 packets
181510
0150
51019
67412
13047
115310
29210
021013
Emulation w/ S2 - Framing
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
11040
86110
15157
0231
size 6size 4
181510
0150
51019
67412
size 3
13047
115310
29210
021013
Mega Packets (of size 12)
0000
0000
0000
0000
Emulation w/ S2 - Framing
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
11040
86110
15157
0231
size 6size 4
181510
0150
51019
67412
size 3
13047
115310
29210
021013
Mega Packets (of size k=12)
0000
0000
0000
0000
Emulation w/ S2 - Framing
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
11040
86110
15157
0231
size 6size 4
181510
0150
51019
67412
size 3(leftovers)
1003
3132
2110
0221
Mega Packets (of size 12)
3011
2102
0250
0023
Emulation w/ S2 - Framing
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
11040
86110
15157
0231
size 6size 4(leftovers)
1201
0120
2110
0110
size 3(leftovers)
1003
3132
2110
0221
Mega Packets (of size 12)
3264
2112
1553
2237
Emulation w/ S2 - Framing
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
1000
0010
1111
0011
size 6(leftovers)
size 4(leftovers)
1201
0120
2110
0110
size 3(leftovers)
1003
3132
2110
0221
Mega Packets (of size 12)
3784
6417
8576
2347
Emulation w/ S2 - Framing
Sum of each row/column is boundedFor mega packets matrix: ≤ (2T+N(Lmax-1))/k
For each leftover matrix of size m: ≤ N(k -1)/m
1000
0010
1111
0011
size 6(leftovers)
size 4(leftovers)
1201
0120
2110
0110
size 3(leftovers)
1003
3132
2110
0221
Mega Packets (of size 12)
3784
6417
8576
2347
< 12/3 < 12/4 < 12/6
Emulation w/ S2 - Decomposition
Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets matrix and then the leftover matrices
max )()# max L
m m
kNm
k
LNTk
TT
nspermutatio1
11(21S
)() maxmax 11(21
kNLLNTT
T
kLN )( max 12
Bound on the mega-packets matrix
Hold each permutation k times for contiguous (mega)-packet delivery
Our Results: Speedup-RQD tradeoff
Speedup
RQD
2
4
2Lmax
S=4+ (N(Lmax-1))/TRQD = 2T+Lmax-1
S=2+(NkLmax-1)/TRQD = 2T+Lmax-1
Wrap-up
Packet-mode scheduling can be done withthe same speedup as cell-mode scheduling
With the price of bounded RQD Future work: lower bounds
??
Thank You!