enabling class of service for cioq switches with maximal weighted algorithms

Enabling Class of Service for CIOQ Switches with

Maximal Weighted Algorithms

Saturday, April 22, 2023

Feng Wang ([email protected])

Siu Hong Yuen ([email protected])

2

Contents

1. Motivation WFQ on OQ switches can provide service for different classes. Can we find maximal weight matching algorithms to provide

service for different classes for CIOQ switches?

2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion

3

Motivation

We know that by using WFQ, we can provide service for different classes based on the priorities of the classes for OQ switches.

However, OQ switches are impractical to implement because of the high memory bandwidth and fabric switch bandwidth required.

4

Motivation

It is shown that with a speedup of 2, using stable marriage algorithm, CIOQ switches can emulate OQ switches.

Can we find maximal matching algorithms that can provide service for different classes same as OQ switch with WFQ for a CIOQ switch at a speedup of S?

5

Contents

1. Motivation2. Metric used

WFQ as an ideal algorithm Using bandwidth as a quantative metric

3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion

6

Metric used

We used the WFQ algorithm implemented on OQ switches as the ideal algorithm to provide service for multiple classes.

Thus, in order to measure the effectiveness of our algorithms, we need a quantitative metric to compare our algorithms against the WFQ algorithm.

7

Metric used

Bandwidth metric measures whether the distribution of bandwidth that our algorithm produces is similar to that of WFQ.

During a time period T, we observe the distribution of packets departing from the OQ (using WFQ) and the CIOQ (using our algorithm).

Denote the number of class k packets departed from output port j of the OQ as Xjk, and the number of class k packets departed from output port j of the CIOQ as Yjk.

8

Metric used

For output port j, Bandwidth used by class k for the OQ

xjk = Xjk / TBandwidth used by class k for the CIOQ

yjk = Yjk / T Bandwidth metric we use:

BDiff ranges from 0 to 1. The closer BDiff is to 0, the closer we are to emulating

WFQ for OQ switches. T is chosen as the time taken for the WFQ algorithm to

finish one round-robin cycle.

`

P

kjkjk xyjBDiff

1

2)()(

9

Contents

1. Motivation2. Metric used3. Simulation Environment

Simulator Switch configuration Traffic Sampling

4. Algorithms used and their results5. Intuition behind the result6. Further work7. Conclusion

10

Simulation Environment

Simulator: SIM v2.35 Switch: 8x8, 4 classes of service with

weight 5:2:2:1 Traffic model:

Bernoulli iid uniform Bernoulli iid nonuniform: overloaded traffic Bursty uniform Bursty nonuniform: overloaded traffic

Same input traffic trace for OQ and CIOQ switches

Sample the distribution of packets for port 0 each 10 time slots

11

Contents

1. Motivation2. Metric used3. Simulation Environment4. Algorithms used and their results

algo0 to algo4

5. Intuition behind the result6. Further work7. Conclusion

12

Algorithms

We came up with 5 maximal weight matching algorithms that attempt to provide service for multiple classes.

They are based on the request-grant-accept phases similar to iSLIP.

Each VOQij is split into P sub-queues, each sub-queue stores the packet for a class

13

algo0 algo0 is the most basic algorithm out of the 5

algorithms upon which the subsequent algorithms build on. Algo0 is a variation of PIM with support for different priorities.

Request: For each output j that input i has a packet for, it requests that output with weight = 1.

Grant: If output j receives any requests, it determines the request with the largest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.

Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If multiple requests are the same largest weight, ties are broken randomly.

14

algo1 algo0 does not differentiate between different

requests, i.e. all requests are treated equally. algo1 improves on that by associating a

weight with each request. For each VOQij, we calculate Wij

k = weight of class k x amount of time a class k packet has waited at the HoL. Then we take the maximum Wij

k over all k classes for this VOQ and assign this as Wij, the weight of the request from input i to output j

15

algo1

The rest of the algorithm is the same as algo0. Request: For each output j that input i has a

packet for, it requests that output with weight = Wij.

Grant: If output j receives any requests, it determines the request with the largest weight. If multiple requests are the same largest weight, the ties are broken randomly.

Accept: If input i receives any grants, it determines the grant with the largest weight. If multiple requests are the same largest weight, ties are broken randomly.

16

algo2 and algo3

For algo0 and algo1, during the grant and accept phases, ties are broken randomly.

This does not take into consideration which request was granted/accepted previously.

algo2 and algo3 improves on algo0 and algo1 by remembering previous matches in a similar way to iSLIP

For each output, we keep a pointer to the last accepted grant input for every priority.

For each input, we keep a pointer to the last accepted output for every priority.

17

algo2 algo2 is algo0 with the pointer enhancement. Request: For each output j that input i has a packet for,

it requests that output with weight = 1. Grant: If output j receives any requests, it determines

the request with the highest weight (all the same in this case). Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).

Accept: If input i receives any grants, it determines the grant with the highest weight (all the same in this case). If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).

18

algo3 algo3 is algo1 with the pointer enhancement. Request: For each output j that input i has a packet for,

it requests that output with weight = Wij. Grant: If output j receives any requests, it determines

the request with the highest weight. Ties are broken first by priority. If multiple requests are of the same priority, we do the following: the output last granted for this priority is least preferred. We then grant the output that is most preferred (in the round-robin definition).

Accept: If input i receives any grants, it determines the grant with the highest weight. If there are ties, we first select a priority to accept randomly. If there are multiple grants with this priority, we do the following: the input last accepted for this priority is least preferred. We then accept the input that is most preferred (in the round-robin definition).

19

algo4

algo2 and algo3 rotate the pointer for the preferred input port to grant and the preferred output port to accept.

Instead of having a pointer that rotates regularly, algo4 tries to rotate the preference depending on the weight of each class. It attempts to rotate the pointer similar to WFQ, where the pointer stays at a particular preferred port depending on the schedule determined by WFQ.

20

algo4 Request: For each output j that input i has a packet for, it

requests that output with a bitmap showing which priority has a packet.

Grant: Output j maintains a preferred priority, which is updated in a way similar to WFQ for accepted grant request. Assume the preferred priority for output j is k. Output j checks all the received requests has the packet with priority k. If multiple inputs have priority k packets, ties are broken randomly. If no input has priority k packets, the output j updates its preferred priority to the next one.

Accept: Input i also maintains a preferred priority, which is updated in a way similar to WFQ for accepted request. Assume the preferred priority for input i is k. If input i receives any grants, it finds the grant with priority k. If multiple grants have priority k, ties are broken randomly. If no grant has priority k, the the preferred priority is updated to the next one.

21

Result: Bernoulli iid uniformBernoulli iid uniform traffic

0.000000

0.020000

0.040000

0.060000

0.080000

0.100000

0.120000

1 2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

Bernoulli iid uniform traffic

0.000000

0.001000

0.002000

0.003000

0.004000

0.005000

0.006000

2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

22

Result: Bernoulli iid nonuniformBernoulli iid nonuniform traffic

0.000000

0.020000

0.040000

0.060000

0.080000

0.100000

0.120000

0.140000

0.160000

0.180000

0.200000

1 2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

Bernoulli iid nonuniform traffic

0.000000

0.000200

0.000400

0.000600

0.000800

0.001000

0.001200

2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

23

Result: Bursty uniformBursty uniform traffic

0.000000

0.050000

0.100000

0.150000

0.200000

0.250000

0.300000

1 2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

Bursty uniform traffic

0.000000

0.002000

0.004000

0.006000

0.008000

0.010000

0.012000

2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

24

Result: Bursty nonuniformBursty nonuniform traffic

0.000000

0.050000

0.100000

0.150000

0.200000

0.250000

1 2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

Bursty nonuniform traffic

0.000000

0.001000

0.002000

0.003000

0.004000

0.005000

0.006000

2 3 4 5 6 7 8

Speedup

BD

iff

algo0

algo1

algo4

algo2

algo3

25

Results:

In most of the cases, algo1 is better than algo0, algo3 is better than algo2

algo3 is not always better than algo1 algo4 is not always better than algo1

and algo3 When speedup increases, the results

are getting close for different algorithms.

For speedup > 4, the BDiff = 0

26

Content

1. Motivation2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Conclusion

Weight information is helpful Size of matching is not helpful WFQ on both input and output side is not helpful Speedup for BDiff = 0

6. Further work7. Conclusion

27

Intuition behind the result Adding the weight information in the

algorithms helps the scheduler to make the better decision for serving different classes.

Compared with algo0 and algo1, algo2 and algo3 improve the size of the matching because they desynchronize the grants to different ports. However, we observed that algo2 and algo3 did not improve the BDiff metric. So the size of the matching does not help for serving different classes.

Implement WFQ on both input and output port to select grants and accepts does not help to make the better decision. Intuitive thinking: WFQ on output side may help to make better decisions, but we could perhaps shall use other criteria to break ties on the input side.

28

Intuition behind the result

BDiff = 0 for Speedup > 4. 4 is the number of classes in our test. So maybe with Speedup > number of classes , BDiff=0. However, we did a couple of tests for number of classes = 5, BDiff = 0 for speedup > 4 is still hold.

29

Content

1. Motivation2. Bandwidth Metric 3. Simulation Environment4. Algorithms used and their results5. Intuition behind the result6. Further work

Latency metric Existence of a constant Speedup S for BDiff = 0?

7. Conclusion

30

Future work Besides the bandwidth allocated to

different classes of service, the latency is another metric to measure how good the algorithm is. Define the metric for latency as how close the latency of the packets for different classes is to OQ switch, measure the latency metrics for different algorithms.

Investigate more on whether exist a constant speedup S, CIOQ switch can emulate OQ WFQ for the service rate for different classes. Need more theoretical analysis

31

Conclusion We define the metric to evaluate the capability of

algorithms to provide class of service. The metric is measured for different algorithms.

The result suggests that the weight information in selecting grants and accepts is helpful for smaller speedup. When speed up increases, the difference for different algorithm is not obvious. So there is a trade off between simple algorithm or speedup.

Among all the algorithms we tried, algo1 is good enough to provide a good service rate for different classes. Algo3 and Algo4 does not improve from algo1.

It’s possible to find a maximal matching algorithm with certain speedup for CIOQ switch to emulate OQ WFQ for the service rate of different classes

enabling class of service for cioq switches with maximal weighted algorithms

Documents

wfq algorithm

oq xjk

classes of service

rights reserved motivationit

rights reserved motivationwe

class of service

cioq yjk

different classes