enabling class of service for cioq switches with maximal weighted algorithms
DESCRIPTION
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms. Thursday, August 14, 2014. Feng Wang ([email protected]) Siu Hong Yuen ([email protected]). Contents. Motivation WFQ on OQ switches can provide service for different classes. - PowerPoint PPT PresentationTRANSCRIPT
Enabling Class of Service for CIOQ Switches with
Maximal Weighted Algorithms
Saturday, April 22, 2023
Feng Wang ([email protected])
Siu Hong Yuen ([email protected])
2
Contents
1. Motivation WFQ on OQ switches can provide service for different classes. Can we find maximal weight matching algorithms to provide
service for different classes for CIOQ switches?
2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion
3
Motivation
We know that by using WFQ, we can provide service for different classes based on the priorities of the classes for OQ switches.
However, OQ switches are impractical to implement because of the high memory bandwidth and fabric switch bandwidth required.
4
Motivation
It is shown that with a speedup of 2, using stable marriage algorithm, CIOQ switches can emulate OQ switches.
Can we find maximal matching algorithms that can provide service for different classes same as OQ switch with WFQ for a CIOQ switch at a speedup of S?
5
Contents
1. Motivation2. Metric used
WFQ as an ideal algorithm Using bandwidth as a quantative metric
3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion
6
Metric used
We used the WFQ algorithm implemented on OQ switches as the ideal algorithm to provide service for multiple classes.
Thus, in order to measure the effectiveness of our algorithms, we need a quantitative metric to compare our algorithms against the WFQ algorithm.
7
Metric used
Bandwidth metric measures whether the distribution of bandwidth that our algorithm produces is similar to that of WFQ.
During a time period T, we observe the distribution of packets departing from the OQ (using WFQ) and the CIOQ (using our algorithm).
Denote the number of class k packets departed from output port j of the OQ as Xjk, and the number of class k packets departed from output port j of the CIOQ as Yjk.
8
Metric used
For output port j, Bandwidth used by class k for the OQ
xjk = Xjk / TBandwidth used by class k for the CIOQ
yjk = Yjk / T Bandwidth metric we use:
BDiff ranges from 0 to 1. The closer BDiff is to 0, the closer we are to emulating
WFQ for OQ switches. T is chosen as the time taken for the WFQ algorithm to
finish one round-robin cycle.
`
P
kjkjk xyjBDiff
1
2)()(
9
Contents
1. Motivation2. Metric used3. Simulation Environment
Simulator Switch configuration Traffic Sampling
4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion
10
Simulation Environment
Simulator: SIM v2.35 Switch: 8x8, 4 classes of service with
weight 5:2:2:1 Traffic model:
Bernoulli iid uniform Bernoulli iid nonuniform: overloaded traffic Bursty uniform Bursty nonuniform: overloaded traffic
Same input traffic trace for OQ and CIOQ switches
Sample the distribution of packets for port 0 each 10 time slots
11
Contents
1. Motivation2. Metric used3. Simulation Environment4. Algorithms used and their results
algo0 to algo4
5. Intuition behind the result6. Further work7. Conclusion
12
Algorithms
We came up with 5 maximal weight matching algorithms that attempt to provide service for multiple classes.
They are based on the request-grant-accept phases similar to iSLIP.
Each VOQij is split into P sub-queues, each sub-queue stores the packet for a class
13
algo0 algo0 is the most basic algorithm out of the 5
algorithms upon which the subsequent algorithms build on. Algo0 is a variation of PIM with support for different priorities.
Request: For each output j that input i has a packet for, it requests that output with weight = 1.
Grant: If output j receives any requests, it determines the request with the largest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.
Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.
14
algo1 algo0 does not differentiate between different
requests, i.e. all requests are treated equally. algo1 improves on that by associating a
weight with each request. For each VOQij, we calculate Wij
k = weight of class k x amount of time a class k packet has waited at the HoL. Then we take the maximum Wij
k over all k classes for this VOQ and assign this as Wij, the weight of the request from input i to output j
15
algo1
The rest of the algorithm is the same as algo0. Request: For each output j that input i has a
packet for, it requests that output with weight = Wij.
Grant: If output j receives any requests, it determines the request with the largest weight. If multiple requests are the same largest weight, the ties are broken randomly.
Accept: If input i receives any grants, it determines the grant with the largest weight. If multiple requests are the same largest weight, ties are broken randomly.
16
algo2 and algo3
For algo0 and algo1, during the grant and accept phases, ties are broken randomly.
This does not take into consideration which request was granted/accepted previously.
algo2 and algo3 improves on algo0 and algo1 by remembering previous matches in a similar way to iSLIP
For each output, we keep a pointer to the last accepted grant input for every priority.
For each input, we keep a pointer to the last accepted output for every priority.
17
algo2 algo2 is algo0 with the pointer enhancement. Request: For each output j that input i has a packet for,
it requests that output with weight = 1. Grant: If output j receives any requests, it determines
the request with the highest weight (all the same in this case). Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).
Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).
18
algo3 algo3 is algo1 with the pointer enhancement. Request: For each output j that input i has a packet for,
it requests that output with weight = Wij. Grant: If output j receives any requests, it determines
the request with the highest weight. Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).
Accept: If input i receives any grants, it determines the grant with the highest weight. If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).
19
algo4
algo2 and algo3 rotate the pointer for the preferred input port to grant and the preferred output port to accept.
Instead of having a pointer that rotates regularly, algo4 tries to rotate the preference depending on the weight of each class. It attempts to rotate the pointer similar to WFQ, where the pointer stays at a particular preferred port depending on the schedule determined by WFQ.
20
algo4 Request: For each output j that input i has a packet for, it
requests that output with a bitmap showing which priority has a packet.
Grant: Output j maintains a preferred priority, which is updated in a way similar to WFQ for accepted grant request. Assume the preferred priority for output j is k. Output j checks all the received requests has the packet with priority k. If multiple inputs have priority k packets, ties are broken randomly. If no input has priority k packets, the output j updates its preferred priority to the next one.
Accept: Input i also maintains a preferred priority, which is updated in a way similar to WFQ for accepted request. Assume the preferred priority for input i is k. If input i receives any grants, it finds the grant with priority k. If multiple grants have priority k, ties are broken randomly. If no grant has priority k, the the preferred priority is updated to the next one.
21
Result: Bernoulli iid uniformBernoulli iid uniform traffic
0.000000
0.020000
0.040000
0.060000
0.080000
0.100000
0.120000
1 2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
Bernoulli iid uniform traffic
0.000000
0.001000
0.002000
0.003000
0.004000
0.005000
0.006000
2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
22
Result: Bernoulli iid nonuniformBernoulli iid nonuniform traffic
0.000000
0.020000
0.040000
0.060000
0.080000
0.100000
0.120000
0.140000
0.160000
0.180000
0.200000
1 2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
Bernoulli iid nonuniform traffic
0.000000
0.000200
0.000400
0.000600
0.000800
0.001000
0.001200
2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
23
Result: Bursty uniformBursty uniform traffic
0.000000
0.050000
0.100000
0.150000
0.200000
0.250000
0.300000
1 2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
Bursty uniform traffic
0.000000
0.002000
0.004000
0.006000
0.008000
0.010000
0.012000
2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
24
Result: Bursty nonuniformBursty nonuniform traffic
0.000000
0.050000
0.100000
0.150000
0.200000
0.250000
1 2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
Bursty nonuniform traffic
0.000000
0.001000
0.002000
0.003000
0.004000
0.005000
0.006000
2 3 4 5 6 7 8
Speedup
BD
iff
algo0
algo1
algo4
algo2
algo3
25
Results:
In most of the cases, algo1 is better than algo0, algo3 is better than algo2
algo3 is not always better than algo1 algo4 is not always better than algo1
and algo3 When speedup increases, the results
are getting close for different algorithms.
For speedup > 4, the BDiff = 0
26
Content
1. Motivation2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Conclusion
Weight information is helpful Size of matching is not helpful WFQ on both input and output side is not helpful Speedup for BDiff = 0
6. Further work7. Conclusion
27
Intuition behind the result Adding the weight information in the
algorithms helps the scheduler to make the better decision for serving different classes.
Compared with algo0 and algo1, algo2 and algo3 improve the size of the matching because they desynchronize the grants to different ports. However, we observed that algo2 and algo3 did not improve the BDiff metric. So the size of the matching does not help for serving different classes.
Implement WFQ on both input and output port to select grants and accepts does not help to make the better decision. Intuitive thinking: WFQ on output side may help to make better decisions, but we could perhaps shall use other criteria to break ties on the input side.
28
Intuition behind the result
BDiff = 0 for Speedup > 4. 4 is the number of classes in our test. So maybe with Speedup > number of classes , BDiff=0. However, we did a couple of tests for number of classes = 5, BDiff = 0 for speedup > 4 is still hold.
29
Content
1. Motivation2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work
Latency metric Existence of a constant Speedup S for BDiff = 0?
7. Conclusion
30
Future work Besides the bandwidth allocated to
different classes of service, the latency is another metric to measure how good the algorithm is. Define the metric for latency as how close the latency of the packets for different classes is to OQ switch, measure the latency metrics for different algorithms.
Investigate more on whether exist a constant speedup S, CIOQ switch can emulate OQ WFQ for the service rate for different classes. Need more theoretical analysis
31
Conclusion We define the metric to evaluate the capability of
algorithms to provide class of service. The metric is measured for different algorithms.
The result suggests that the weight information in selecting grants and accepts is helpful for smaller speedup. When speed up increases, the difference for different algorithm is not obvious. So there is a trade off between simple algorithm or speedup.
Among all the algorithms we tried, algo1 is good enough to provide a good service rate for different classes. Algo3 and Algo4 does not improve from algo1.
It’s possible to find a maximal matching algorithm with certain speedup for CIOQ switch to emulate OQ WFQ for the service rate of different classes