packet scheduling/arbitration in virtual output queues: maximal matching algorithms (part ii)

93
1 CIST560 by M. Hamdi Packet Scheduling/Arbitrati on in Virtual Output Queues: Maximal Matching Algorithms (Part II)

Upload: shiro

Post on 28-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II). Pointer Desynchronization. Performance: RRM < iSlip < FIRM Difference only in updating pointers Observation: iSlip and FIRM can effectively desynchronize their output pointers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

1CIST560 by M. Hamdi

Packet Scheduling/Arbitration in Virtual Output Queues:

Maximal Matching Algorithms

(Part II)

Page 2: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

2CIST560 by M. Hamdi

Pointer Desynchronization

• Performance: RRM < iSlip < FIRM

• Difference only in updating pointers

• Observation: iSlip and FIRM can effectively desynchronize their output pointers

• The best effect of pointer desynchronization is achieved if forced

Page 3: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

3CIST560 by M. Hamdi

Static Round Robin Matching (SRR):To Achieve FULL Desynchronization

• Initialization. The input pointers are set to 0's. The output pointers are set to some initial pattern such that there is no duplication among the pointers.

• The 3 steps of one iteration are:– Request. Each input sends a request to every output for which it

has a queued cell.– Grant. If an output receives any requests, it chooses the one that

appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer to the highest priority element of the round-robin schedule is always incremented by one (modulo N) whether there is a grant or not.

Page 4: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

4CIST560 by M. Hamdi

SRR (Cont’d)

– Accept. If an input receives a grant, it accepts the one that appears next in a fixed round-robin schedule starting from the highest priority element. The pointer to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted one.

• In DSRR (Improved version of SRR), input pointers are also desynchronized.

• Rotating DSRR (RDSRR):– Unfairness among inputs under special traffic model.

– Outputs searching in clockwise and anti-clockwise directions alternatively to decide grants.

xx

xx

xx

xx

00

00

00

00

Page 5: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

5CIST560 by M. Hamdi

Simulation Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

Normalized load

Rel

ativ

e av

erag

e de

lay

32x32 switch under uniform traffic

iSlipFIRM SRR DSRR RDSRR

Page 6: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

6CIST560 by M. Hamdi

Simulation Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

40

45

Normalized load

Rel

ativ

e av

erag

e de

lay

32x32 switch under uniform bursty traffic

iSlipFIRM SRR DSRR RDSRR

Page 7: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

7CIST560 by M. Hamdi

Simulation Results

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55100

101

102

103

104

Normalized load

Rel

ativ

e av

erag

e de

lay

32x32 switch under hotspot traffic

iSlipFIRM SRR DSRR RDSRR

Page 8: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

8CIST560 by M. Hamdi

Simulation Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1100

101

102

103

104

Normalized load

Ave

rage

del

ay32x32 switch under unbalanced traffic

iSlipFIRM SRR DSRR RDSRR

Page 9: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

9CIST560 by M. Hamdi

Stability Property

• A VOQ switch is considered stable if it approaches a steady state where the expected length of each VOQ is bounded. If it is stable, 100% throughput can be achieved under any admissible traffic pattern.

• RDSRR is more stable than iSlip and FIRM under various traffic patterns.

Page 10: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

10CIST560 by M. Hamdi

Stability Property (Cont’d)

0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10.94

0.95

0.96

0.97

0.98

0.99

1

1.01

Normalized load

Thr

ough

put

32x32 switch under unbalanced traffic

iSlip FIRM RDSRR Output

Page 11: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

11CIST560 by M. Hamdi

3-Phase & 2-Phase Algorithms

• iSlip & FIRM are 3-phase algorithms: Request-Grant-Accept

• DRRM is 2-phase algorithm: Grant-Accept– Each input sends one grant

– Each output sends one accept

• 2-FIRM is the 2-phase version of FIRM

Page 12: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

12CIST560 by M. Hamdi

DRRM (Dual Round Robin Matching)

Page 13: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

13CIST560 by M. Hamdi

3-Phase & 2-Phase Algorithms

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

Normalized load

Re

lati

ve

av

era

ge

de

lay

32x32 switch under uniform traffic

iSlip DRRM FIRM 2-FIRM

Page 14: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

14CIST560 by M. Hamdi

3-Phase & 2-Phase Algorithms

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.5510

0

101

102

103

104

Normalized load

Re

lati

ve

av

era

ge

de

lay

32x32 switch under hotspot traffic

iSlip DRRM FIRM 2-FIRM

Page 15: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

15CIST560 by M. Hamdi

3-Phase & 2-Phase Algorithms

• In general case, the traffic model changes from time to time

• When the temporary non-uniformity is on the input side, 3-phase scheme performs better

• When the temporary non-uniformity is on the output side, 2-phase scheme performs better

Page 16: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

16CIST560 by M. Hamdi

2-stage Maximum Size Matching Algorithm: Description

• The 2-stage algorithm works in the following way: 1. The pointers at both input and output sides are kept fully desynchronized.

2. In each iteration, there are 3 steps:

Step 1: Each input sends a request to every output for which it has a queued cell.

  Step 2: Each input selects one VOQ to send grant that appears next starting from its highest priority output. Each output selects one request received in step 1 to send grant that appears next starting from its highest priority input. OutputCount = number of outputs receiving grants from inputs. InputCount = number of inputs receiving grants from outputs.

Page 17: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

17CIST560 by M. Hamdi

2-stage Maximum Size Matching Algorithm: Description

• Step 3: If OutputCount ? InputCount, each output selects one among the grants received in step 2 which appears next starting from its highest priority input and sends accept.

Else, each input selects one among the grants received in step 2 which appears next starting from its highest priority output and sends accept.

• In simple words, this algorithm will decide in each time slot whether to use 2-phase or 3-phase scheme based on which one can make more matches.

Page 18: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

18CIST560 by M. Hamdi

2-stage Maximum Size Matching Algorithm: Hardware

ImplementationSt

ate

of I

npu

t Q

ueu

es

(N2

bit

s)

1

2

N

1

2

N

Dec

isio

n R

egis

ter

Grant Arbiters

Accept Arbiters

Output Counter

Input Counter

Comparator

1st group of inputs 2nd group of inputs 2 physical lines from comparator

Page 19: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

19CIST560 by M. Hamdi

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

Normalized load

Rela

tive

ave

rage d

ela

y

32x32 switch under uniform traffic (1 iteration)

iSlip FIRM 2-StageSRR Output

Performance Evaluation: Simulation StudyU

nif

orm

Tra

ffic

Page 20: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

20CIST560 by M. Hamdi

Performance Evaluation: Simulation Study

Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99

Improvement

Percentage 67% 196% 81% 58% 60% 84% 43%

Normalized Improvement Percentage

40% 66% 45% 37% 37% 46% 30%

Improvement Factor

1.67 2.96 1.81 1.58 1.60 1.84 1.43

Improvement Percentage

7% 75% 92% 54% 59% 83% 43%

Normalized Improvement Percentage

7% 43% 48% 35% 37% 45% 30%

Improvement Factor

1.07 1.75 1.92 1.54 1.59 1.83 1.43

2-s

tag

e

over

iSlip

SR

R

over

iSlip

Page 21: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

21CIST560 by M. Hamdi

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

40

45

Normalized load

Rela

tive

ave

rage d

ela

y

32x32 switch under uniform bursty traffic (1 iteration)

iSlip FIRM 2-StageSRR Output

Performance Evaluation: Simulation StudyB

urs

ty

Tra

ffic

Page 22: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

22CIST560 by M. Hamdi

Load 0.63 0.7 0.75 0.8 0.85 0.9

Improvement

Percentage 213% 96% 70% 46% 28% 16%

Normalized Improvement Percentage

68% 49% 41% 31% 22% 14%

Improvement Factor

3.13 1.96 1.70 1.46 1.28 1.16

Improvement Percentage

89% 56% 46% 33% 22% 14%

Normalized Improvement Percentage

47% 36% 32% 25% 18% 12%

Improvement Factor

1.89 1.56 1.46 1.33 1.22 1.14

Performance Evaluation: Simulation Study

2-s

tag

e

over

iSlip

SR

R

over

iSlip

Page 23: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

23CIST560 by M. Hamdi

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.5510

0

101

102

103

104

Normalized load

Rela

tive

ave

rage d

ela

y

32x32 switch under hotspot traffic (1 iteration)

iSlip FIRM 2-StageSRR Output

Performance Evaluation: Simulation StudyH

ots

pot

Tra

ffic

Page 24: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

24CIST560 by M. Hamdi

Load 0.31 0.38 0.43 0.46 0.50

Improvement

Percentage 26% 56% 101626% 160469% 81633%

Normalized Improvement Percentage

21% 36% 100% 100% 100%

Improvement Factor

1.26 1.56 1017.26 1605.69 817.33

Improvement Percentage

5% 9% 56177% 74631% 19618%

Normalized Improvement Percentage

5% 8% 99% 100% 99%

Improvement Factor

1.05 1.09 562.77 747.31 197.18

Performance Evaluation: Simulation Study

2-s

tag

e

over

iSlip

SR

R

over

iSlip

Page 25: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

25CIST560 by M. Hamdi

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0

101

102

103

Normalized load

Rela

tive

ave

rage d

ela

y

32x32 switch under cross-shaped traffic (1 iteration)

iSlip FIRM 2-StageSRR Output

Performance Evaluation: Simulation Study

Un

bala

nced

Tra

ffic

Page 26: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

26CIST560 by M. Hamdi

Performance Evaluation: Simulation Study

Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99

Improvement

Percentage 12% 39% 53% 142% 552% 8040% 3351%

Normalized Improvement Percentage

11% 28% 35% 59% 85% 99% 97%

Improvement Factor

1.12 1.39 1.53 2.42 6.52 81.40 34.51

Improvement Percentage

4% 35% 74% 225% 843% 11494% 3499%

Normalized Improvement Percentage

4% 26% 43% 69% 89% 99% 97%

Improvement Factor

1.04 1.35 1.74 3.25 9.43 115.94 35.99

2-s

tag

e

over

iSlip

SR

R

over

iSlip

Page 27: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

27CIST560 by M. Hamdi

A new algorithm – RDESRR

• Real Desynchronized Round Robin Model (RDESRR)• Based on 2 phases RRM model (Request and Grant)• Add a small share memory that each outputs can

read/write (called Share Bits)• The size of the memory is 1 bit per input• If the bit is set, the corresponding input has already

granted by an output• If the bit is not set, the output may grant to

corresponding input port

Page 28: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

28CIST560 by M. Hamdi

RDESRR Conceptual model

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1

3

0

1

2

Share Bits

Page 29: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

29CIST560 by M. Hamdi

RDESRR model• 2 phases only

• Request. Each input sends a request to every output for which it has a queued cell.

• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

Page 30: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

30CIST560 by M. Hamdi

RDESRR Demo - Request

Step 1: Request

0

1

2

3

0

1

2

3

Page 31: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

31CIST560 by M. Hamdi

RDESRR Demo – Add a share memory in Output

Step 2: Grant

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Add a small share memory that each outputs can read/write (called Share Bits)

Page 32: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

32CIST560 by M. Hamdi

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Output check the share bits

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

Share Bits

•The output check the corresponding bit is set or not

Page 33: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

33CIST560 by M. Hamdi

RDESRR Demo – When share bit is occupied

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

3 02 1

3 02 1

3 02 1

3 02 1

Share Bits

•if not set, the output will set the bit and notifies the input its request was granted•The share bit is First Come First Serve

Page 34: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

34CIST560 by M. Hamdi

RDESRR Demo – Output looks for next request

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•If set, the output will look for next request until all requests have gone through

Page 35: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

35CIST560 by M. Hamdi

RDESRR Demo – All share bits are allocated

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Fully allocate the share bit will result for fully grant all input request

Page 36: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

36CIST560 by M. Hamdi

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Pointer update/Share bit reset

0

1

2

3

0

1

2

3

3

0

1

2

Share Bits

•The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input•If no request is received, the pointer stays unchanged•Share bits are also reset

Page 37: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

37CIST560 by M. Hamdi

SIM Results• Run the test for 32x32 port in SIM using –l 1000000

Total Latency Avg Match Size0.1 0.0588 3.1958 0.2 0.1447 6.3938 0.3 0.2686 9.5947 0.4 0.4501 12.7940 0.5 0.7198 15.9960 0.6 1.1398 19.1980 0.7 1.8636 22.3961 0.8 3.2619 25.5986 0.9 7.5087 28.8003 1.0 715.5900 31.9850

RDESRR

Page 38: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

38CIST560 by M. Hamdi

Input QueueingLongest Queue First or

Oldest Cell First

1234

1234

1234

1234

10 1

1

1

1 10

Maximum weight

Weight Waiting Time

100%Queue Length { } =

Page 39: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

39CIST560 by M. Hamdi

Input QueueingWhy is serving long/old queues better than serving

maximum number of queues?

• When traffic is uniformly distributed, servicing themaximum number of queues leads to 100% throughput.

• When traffic is non-uniform, some queues become longer than others.

• A good algorithm keeps the queue lengths matched, and

services a large number of queues.

VOQ #

Avg

Occ

upan

cy Uniform traffic

VOQ #

Avg

Occ

upan

cy

Non-uniform traffic

Page 40: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

40CIST560 by M. Hamdi

Maximum/Maximal Weight Matching

• 100% throughput for admissible traffic (uniform or non-uniform)

• Maximum Weight Matching– OCF (Oldest Cell First): w=cell waiting time

– LQF (Longest Queue First):w=input queue occupancy

– LPF (Longest Port First):w=QL of the source port + Sum of QL form the source port to the destination port

• Maximal Weight Matching (practical algorithms)– iOCF

– iLQF

– iLPF (comparators in the critical path of iLQF are removed )

Page 41: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

41CIST560 by M. Hamdi

Maximal Weight Matching Algorithms: iLQF

• Request. Each unmatched input sends a request word of width bits to each output for which it has a queued cell, indicating the number of cells that it has queued to that output.

• Grant. If an unmatched output receives any requests, it chooses the largest valued request. Ties are broken randomly.

• Accept. If an unmatched input receives one or more grants, it accepts the one to which it made the largest valued request. Ties are broken randomly.

Page 42: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

42CIST560 by M. Hamdi

Maximal Weight Matching Algotithms: iLQF

• The i-LQF algorithm has the following properties:

• Property 1. Independent of the number of iterations, the longest input queue is always served.

• Property 2. As with i-SLIP, the algorithm converges in at most logN iterations.

• Property 3. For an inadmissible offered load, an input queue may be starved.

Page 43: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

43CIST560 by M. Hamdi

Maximal Weight Matching Algotithms: iOCF

• The i-OCF algorithm works in similar fashion to iLQF, and has the following properties:

• Property 1. Independent of the number of iterations, the cell that has been waiting the longest time in the input queues (it must at the head of the queue)

• Property 2. As with i-LQF, the algorithm converges in at most logN iterations.

• Property 3. No input queue can be starved indefinitely.

• Property 4. It is difficult to keep time stamps on the cells.

Page 44: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

44CIST560 by M. Hamdi

iLQF - Implementation

Page 45: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

45CIST560 by M. Hamdi

iLPF - ImplementationComplicated hardware

Page 46: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

46CIST560 by M. Hamdi

Other research efforts

• Packet-based arbitration• Exhaustive-based arbitration• Numerous other efforts

Page 47: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

47CIST560 by M. Hamdi

Packet Scheduling/Arbitration in Virtual Output Queues:Randomized Algorithms

and Others

Page 48: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

48CIST560 by M. Hamdi

Input-Queued Packet Switch

Crossbar

Scheduler

inputs

outputs

1

N

1 N

.

.

.

.

. . . .

i,j

N,N

1,

1

Xi,j

(i i i,j < 1 ; j j i,j < 1)

Page 49: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

49CIST560 by M. Hamdi

Bipartite Graph and Matrix

011

111

001inputs

outputs

1

2

3

321

Page 50: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

50CIST560 by M. Hamdi

Stability of Scheduling

Definition:

Let Xi,j(t) be the number of packets queued at input i for output j at time-slot t.

Then an algorithm is stable iff:

)(

, , tXEji ji

Page 51: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

51CIST560 by M. Hamdi

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

Maximum size matching

Maximum weight matching

1

2

3

4

1

2

3

4

8

6

4

2

1

3

1

1

2

3

4

1

2

3

4

8

6

4

Maximum Matching in VOQ Architecture

Page 52: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

52CIST560 by M. Hamdi

Complexity of Maximum Matchings

• Maximum Size/Cardinality Matchings:– It is not a stable algorithm

– Algorithm by Dinic O(N5/2)

• Maximum Weight Matchings– Algorithm by Kuhn O(N3logN)

– It is a stable algorithm

• In general:– Hard to implement in hardware (does not lend itself to

simple hardware implementation not because of its serial time complexity)

– Slooooow.

Page 53: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

53CIST560 by M. Hamdi

Maximal Matching Algorithms

• Maximal matching algorithms are heuristic algorithms that try to approximate MSM or MWM.

• In general, maximal matching is much simpler to implement (Not because of its time complexity), and has a much faster running time.

• A maximal size matching is at least half the size of a maximum size matching.

• A maximal weight matching is at least half the size of a maximum weight matching.

Page 54: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

54CIST560 by M. Hamdi

Maximal Size Matching Algorithm: Performance and Properties

• Can have 100% throughtput under uniform traffic

• They converge in logN iterations to a maximal size matching

• Their performance can be quite good (close to an ideal Output Queued Switch) with multiple iterations

• The best iterative maximal size matching algorithm takes O(N2logN) serial or O(log N) parallel time steps.

• If the number of iterations is constant, then it can be implemented in constant time (that is why it is practical).

Page 55: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

55CIST560 by M. Hamdi

Sta

te o

f In

pu

t Q

ueu

es (

N 2 b

its)

1

2

N

1

2

N

Dec

isio

n R

egis

ter

Grant Arbiters Request Arbiters

Implementation of the parallel maximal matching algorithms

Page 56: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

56CIST560 by M. Hamdi

Small Differences (in implementation) between RRM, iSlip & FIRM

But large difference in performance

RRM iSlip FIRM

Input No grant unchanged

Granted one location beyond the accepted one

Output

No request unchanged

Grant accepted

one location beyond the granted one

Grant not accepted

one location beyond the previously granted one

unchanged the granted one

Page 57: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

57CIST560 by M. Hamdi

Maximum/Maximal Weight Matching

• 100% throughput for admissible traffic (uniform or non-uniform)

• Maximum Weight Matching– OCF (Oldest Cell First): w=cell waiting time

– LQF (Longest Queue First):w=input queue occupancy

– LPF (Longest Port First):w=QL of the source port + Sum of QL form the source port to the destination port

• Maximal Weight Matching (practical iterative algorithms)

• Make these maximal weight matching algorithms operate like iSLIP– iOCF

– iLQF

– iLPF

Page 58: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

58CIST560 by M. Hamdi

Maximal Weight Matching Algorithms: iLQF

• Request. Each unmatched input sends a request word of width bits to each output for which it has a queued cell, indicating the number of cells that it has queued to that output.

• Grant. If an unmatched output receives any requests, it chooses the largest valued request (has the longest queue). Ties are broken randomly.

• Accept. If an unmatched input receives one or more grants, it accepts the one to which it made the largest valued request (has the longest queue). Ties are broken randomly.

Page 59: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

59CIST560 by M. Hamdi

Maximal Weight Matching Algotithms: iLQF

• The i-LQF algorithm has the following properties:

• Property 1. Independent of the number of iterations, the longest input queue is always served.

• Property 2. As with i-SLIP, the algorithm converges in at most logN iterations.

• Property 3. For an inadmissible offered load, an input queue may be starved.

• Property 4. It is a stable algorithm.

Page 60: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

60CIST560 by M. Hamdi

Maximal Weight Matching Algotithms: iOCF

• The i-OCF algorithm works in similar fashion to iLQF, and has the following properties:

• Property 1. Independent of the number of iterations, the cell that has been waiting the longest time in the input queues (it must at the head of the queue)

• Property 2. As with i-LQF, the algorithm converges in at most logN iterations.

• Property 3. No input queue can be starved indefinitely.

• Property 4. It is difficult to keep time stamps on the cells.

Page 61: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

61CIST560 by M. Hamdi

Can we do better with than maximal matchings

usingRandomized Algorithms

Page 62: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

62CIST560 by M. Hamdi

MotivationMotivation• Networking problems suffer from the “curse of

dimensionality”– algorithmic solutions do not scale well

• Typical causes– size: large number of users or large number of I/O

– time: very high speeds of operation

• A good deterministic algorithm exists (Max Flow), but …– it requires too large a data structure

– it needs state information, and “state” is too big

– it “starts from scratch” in each iteration

Page 63: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

63CIST560 by M. Hamdi

Randomization• Randomized algorithms have frequently been used in many

situations where the state space (e.g., different number of connections between input and output N!) is very large

• Randomized algorithms– are a powerful way of approximating

– it is often possible to randomize deterministic algorithms

– this simplifies the implementation while retaining a (surprisingly) high level of performance

• The main idea is – to simplify the decision-making process

– by basing decisions upon a small, randomly chosen sample of the state

– rather than upon the complete state

Page 64: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

64CIST560 by M. Hamdi

An Illustrative ExampleFind the largest element of a set S of size 1 billion

• Deterministic algorithm: linear search – has a complexity of 1 billion

• The randomized version: find the largest of 10 randomly chosen samples– has a complexity of 10

– (note: this ignores complexity of choosing 10 random samples)

• Performance– linear search will find the absolute largest element

– if R is the element found by randomized algorithm, we can make statements like

P(R is at least the 100 millionth largest element) = thus, we can say that the performance of the randomized algorithm is very

good with a high probability

101

110

Page 65: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

65CIST560 by M. Hamdi

Randomizing Iterative Schemes (e.g., iSLIP)

• Often, we want to perform some operation iteratively• Example: find the heaviest matching in a switch in every time

slot• Since, in each time slot

– at most one packet can arrive at each input– and, at most one packet can depart from each output the size of the queues, or the “state” of the switch, doesn’t change by

much between successive time slots so, a matching that was heavy at time t will quite likely continue to be

heavy at time t+1

• This suggests that– knowing a heavy matching at time t should help in determining a heavy

matching at time t+1 there is no need to start from scratch in each time slot

Page 66: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

66CIST560 by M. Hamdi

Summarizing Randomized Algorithms

• Randomized algorithms can help simplify the implementation– by reducing the amount of work in each iteration

• If the state of the system doesn’t change by much between iterations, then– we can reduce the work even further by carrying information

between iterations

• The big pay-off is that, even though it is an approximation, the performance of a

randomized scheme can be surprisingly good

Page 67: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

67CIST560 by M. Hamdi

Randomized Scheduling Algorithms: Example

• Consider a 3 x 3 input-queued switch – input traffic: is Bernoulli IID and λij = α/3 for all i, j, and

α < 1

– This is admissible

– note: there are a total of 6 (= 3!) possible service matrices

111

111

111

3/

3/3/3/

3/3/3/

3/3/3/

100

010

001

010

100

001

100

001

010

001

100

010

010

001

100

001

010

100

Page 68: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

68CIST560 by M. Hamdi

Random Scheduling Algorithms

• In time slot n, let S(n) be equal to one of the 6 possible matchings independently and uniformly at random

• Stability of Random – Consider L11(n), the number of packets in VOQ11

• arrivals to VOQ11 occur according to A11(n), which is Bernoulli IID • input rate = λ11 = α/3 • this queue gets served whenever the service matrix connects input 1 to

output 1 • There are 2 service matrices that connect input 1 to output 1 • since Random chooses service matrices u.a.r., input 1 is connected to

output 1 1. for a fraction of time = 2/6 = 1/3 --- the service rate between input1 and output1

• E(L11(n)) < iff λ11 < 1/3 α < 1

• This random algorithm is stable.

Page 69: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

69CIST560 by M. Hamdi

Random Scheduling Algorithms

• Instability of Random • Now suppose λii = α for all i and λij =0 for

– clearly, this is admissible traffic for all α < 1

– but, under Random, the service rate at VOQ11 is 1/3 at best

– hence VOQ11 and the switch will be unstable as soon as

• Stability (or 100% throughput) means it is stable under all admissible traffic!

ji

3/1

Page 70: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

70CIST560 by M. Hamdi

• Switch Size : 32 x 32

• Input Traffic (shown for a 4 X 4 switch) – diagonal load matrix:

• normalized load=x+y<1

• x=2y

• It is a good test-case

Simulation Scenario

xy

yx

yx

yx

00

00

00

00

Page 71: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

71CIST560 by M. Hamdi

Obvious Randomized Schemes

• Choose a matching at random and use it as the schedule doesn’t give 100% throughput (already shown)

• Choose 2 matchings at random and use the heavier one as the schedule

• Choose N matchings at random and use the heaviest one as the schedule

None of these can give 100% throughput !!

Page 72: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

72CIST560 by M. Hamdi

0.001

0.01

0.1

1

10

100

1000

10000

0.0 0.2 0.4 0.6 0.8 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWM R32R1

Page 73: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

73CIST560 by M. Hamdi

Bounds on Maximum Throughput

Page 74: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

74CIST560 by M. Hamdi

Iterative Randomized Scheme(Tassiulas)

• Say M is the matching used at time t

• Let R be a new matching chosen uniformly at random (u.a.r.) among the N! different matchings

• At time t+1, use the heavier of M and R• Complexity is very low O(1) iterations • This gives 100% throughput !

note the boost in throughput is due to memory (saving previous matchings)

• But, delays are very large

Page 75: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

75CIST560 by M. Hamdi

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMTassiulas

Page 76: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

76CIST560 by M. Hamdi

Observations for Improvement

• Most of the weight of a matching is carried in a small number of edges

• Hence, remember edges not matchings• We can have 100% throughput under all

admissible traffic.

Page 77: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

77CIST560 by M. Hamdi

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMR32M32 R1M1 Tassiulas

Page 78: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

78CIST560 by M. Hamdi

Finer Observations

• Let M be schedule used at time t

• Choose a “good’’ random matching R

• M’ = Merge(M,R)

• M’ includes best edges from M and R

• Use M’ as schedule at time t+1

• Above procedure yields algorithm called LAURA• There are many other small variations to this algorithm.

Page 79: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

79CIST560 by M. Hamdi

3

2

3

2

2

1

2

3

4

1Merging

3

2

3

3

1

X R3-1+2-2=2

2-1+2-4=-1

W(X)=12 W(R)=10

M

W(M)=13

Merging Procedure

Page 80: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

80CIST560 by M. Hamdi

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMM-LAURA LAURAiLQFTassiulas

Page 81: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

81CIST560 by M. Hamdi

Can we avoid having schedulers altogether !!!

Page 82: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

82CIST560 by M. Hamdi

Recap:Recap: Two Successive Scaling Problems Two Successive Scaling Problems

OQ routers: + work-conserving (QoS)- memory bandwidth =

(N+1)RR

R

RR

IQ routers: + memory bandwidth = 2R- arbitration complexity

Bipartite Matching

R R

Page 83: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

83CIST560 by M. Hamdi

Today: 64 ports at 10Gbps, 64-byte cells.

• Arbitration Time = = 51.2ns

• Request/Grant Communication BW = 17.5Gbps

10Gbps 64bytes

IQ Arbitration Complexity

Two main alternatives for scaling:1. Increase cell size2. Eliminate arbitration

Scaling to 160Gbps:• Arbitration Time = 3.2ns• Request/Grant Communication BW = 280Gbps

Page 84: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

84CIST560 by M. Hamdi

Desirable Characteristics for Router Architecture

Ideal: OQ• 100% throughput• Minimum delay• Maintains packet order

Necessary: able to regularly connect any input to any output

What if the world was perfect? Assume Bernoulli iid uniform arrival traffic...

Page 85: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

85CIST560 by M. Hamdi

Round-Robin Scheduling

• Uniform & non-bursty traffic => 100% throughput• Problem: traffic is non-uniform & bursty

Page 86: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

86CIST560 by M. Hamdi

Two-Stage Switch (I)

1

N

1

N

1

N

External Outputs

Internal Inputs

External Inputs

First Round-Robin Second Round-Robin

Page 87: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

87CIST560 by M. Hamdi

Two-Stage Switch (I)

1

N

1

N

1

N

External Outputs

Internal Inputs

External Inputs

First Round-Robin Second Round-Robin

Load Balancing

Page 88: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

88CIST560 by M. Hamdi

100% throughputProblem: unbounded mis-sequencing

External Outputs

Internal Inputs

1

N

ExternalInputs

Cyclic Shift Cyclic Shift

1

N

1

N

11

2

2

Two-Stage Switch Characteristics

Page 89: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

89CIST560 by M. Hamdi

Two-Stage Switch (II)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

F ik

F ik

.

.

.

.

.

.

.

FlowSplitter

LoadBalancer VOQs First-Stage Round-Robin Second-Stage Round-RobinVOQs

External inputs Internal outputs Internal inputs External outputs

1 1 1

N N N

1

N

1

N

i

.

.

.

.

.

.

.

.

.

.

.

.

j

.

.

.

.

.

.

.

.

.

.

.

.

j

.

.

.

.

.

.

.

.

.

.

.

.

k

.

.

.

.

.

.

.

.

.

.

.

.

New

N3 instead of N2

Page 90: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

90CIST560 by M. Hamdi

Expanding VOQ Structure

Solution: expand VOQ structure by distinguishing among switch inputs

2

1

3

a

b

Page 91: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

91CIST560 by M. Hamdi

What is being done in practice(Cisco for example)

• They want schedulers that achieve 100% throughput and very low delay (Like MWM)

• They want it to be as simple as iSLIP in terms of hardware implementation

• Is there any solution to this !!!!!

Page 92: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

92CIST560 by M. Hamdi

Typical Performance of ISLIP-like Algorithms

PIM with 4 iterations

Page 93: Packet Scheduling/Arbitration in Virtual Output Queues:  Maximal Matching Algorithms (Part II)

93CIST560 by M. Hamdi

What is being done in practice(Cisco for example)

Company Switching Capacity

Switch Architecture

Fabric Overspeed

Agere 40 Gbit/s-2.5 Tbit/s Arbitrated crossbar 2x

AMCC 20-160 Gbit/s Shared memory 1.0x

AMCC 40 Gbit/s-1.2 Tbit/s Arbitrated crossbar 1-2x

Broadcom 40-640 Gbit/s Buffered crossbar 1-4x

Cisco 40-320 Gbit/s Arbitrated crossbar 2x