an asynchronous routing algorithm for clos networks

38
June 23rd 2010 Advanced Processor Technologies Group The School of Computer Science An Asynchronous Routing Algorithm for Clos Networks Wei Song and Doug Edwards The Advanced Processor Technologies Group (APT) School of Computer Science The University of Manchester {songw, doug}@cs.man.ac.uk 1

Upload: others

Post on 24-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Asynchronous Routing Algorithm for Clos Networks

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

An Asynchronous Routing Algorithm for Clos Networks

Wei Song and Doug Edwards

The Advanced Processor Technologies Group (APT)

School of Computer Science

The University of Manchester

{songw, doug}@cs.man.ac.uk

1

Page 2: An Asynchronous Routing Algorithm for Clos Networks

Outline

• Introduction of Clos networks and asynchronous circuits

• Classic Synchronous dispatching algorithms – Random dispatching (RD)

– Concurrent Round-Robin Dispatching (CRRD)

• Asynchronous Dispatching (AD) algorithm

• Implementation

• Outcome

– 32-port Clos network. Setup 6.2 ns, release 3.9 ns.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

2

Page 3: An Asynchronous Routing Algorithm for Clos Networks

Switching Networks

• Telephone networks

• Asynchronous transfer mode (ATM) and

IP networks

–ATLANTA chip (622Mb/s/port, 1997)

–Petastar Optical Switch (160Gb/s/port, 2003)

• Intra/inter chip interconnect

–High-radix router with many narrow ports

–SDM: delay guaranteed services

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

3

Page 4: An Asynchronous Routing Algorithm for Clos Networks

Asynchronous Circuit

• Asynchronous vs. synchronous

–Low power (clockless)

–Possible performance boost (average latency)

–Tolerance to process variation

• Implementation style

–2-phase vs. 4-phase

–Bundled-data vs. quasi-delay insensitive (QDI)

–4-phase QDI

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

4

Page 5: An Asynchronous Routing Algorithm for Clos Networks

Research Target

• An area-efficient and high-speed switching

network for on-chip networks

–High-radix router

–Low-power consumption

–Tolerance to process variation

–Area efficient

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

5

Page 6: An Asynchronous Routing Algorithm for Clos Networks

Clos Switching Network

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

6

3 stages: IMs, CMs and OMs

C(n, k, m):k IM/OMsn ports per IM/OMm CMs

LI links between IM/CMLO links between CM/OM

SMS: buffer CMMSM: buffer IM/OMS3: no buffer

n´m

n´m

n´m

k´k

k´k

k´k

m´n

m´n

m´n

IM(0)

IM(k-1)

IM(i)

CM(0)

CM(m-1)

CM(r)

OM(0)

OM(k-1)

OM(j)

Page 7: An Asynchronous Routing Algorithm for Clos Networks

Non-Blocking

• Blocking

– An available input/output pairs may not be connected due to internal blocking.

• Strict Non-Blocking (SNB)– All available input/output pairs are connectable.

• Rearrangeable Non-Blocking (RNB)– Connecting available input/output pairs may

require internal permutation.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

7

12 nmn

12 nm

Page 8: An Asynchronous Routing Algorithm for Clos Networks

Area Consumption

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

8

Page 9: An Asynchronous Routing Algorithm for Clos Networks

Routing Algorithms

• Optimal algorithms

–Algorithms provide guaranteed results for all matches but with a higher complexity in time and implementation.

• Heuristic algorithms

–Algorithms provide all or partial connections in much lower time complexity.

• Most dynamically reconfigurable Clos

networks use heuristic algorithms

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

9

Page 10: An Asynchronous Routing Algorithm for Clos Networks

Dispatching Algorithms

Every Input/output pair has mpaths. Packets are dispatched evenly to all CMs.

Dispatching algorithm: the algorithm used in IM to CM packet dispatching.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

10

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

C(2,3,2)

Page 11: An Asynchronous Routing Algorithm for Clos Networks

Random Dispatching (RD)

• Phase 1

– IPs send

requests to

LIs

– LIs select IPs

• Phase 2

– CMs select

LIs

– Configure

granted IPs

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

11

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 12: An Asynchronous Routing Algorithm for Clos Networks

Random Dispatching (RD)

• Phase 1

– IPs send

requests to

LIs

– LIs select IPs

• Phase 2

– CMs select

LIs

– Configure

granted IPs

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

12

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 13: An Asynchronous Routing Algorithm for Clos Networks

Random Dispatching (RD)

• Phase 1

– IPs send

requests to

LIs

– LIs select IPs

• Phase 2

– CMs select

LIs

– Configure

granted IPs

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

13

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

RD cannot resolve contention in IMs.

Page 14: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

14

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 15: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

15

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 16: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

16

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 17: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

17

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 18: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

18

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 19: An Asynchronous Routing Algorithm for Clos Networks

Concurrent Round-Robin Dispatching (CRRD)

• Phase 1

– IPs request

– LIs select

– IPs select

– Go back

(iteration)

• Phase 2

– Same as RD

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

19

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

CRRD cannot resolve contention in CMs.

Page 20: An Asynchronous Routing Algorithm for Clos Networks

Asynchronous Dispatching (AD)

• Difficulties

– Packets arrive asynchronously

– Modules are event-driven

– Orphans are generated by failed requests

• Solution

– Independent algorithms (allocators) run in IMs and CMs

– Failed requests use status feedback from CMs as acknowledge.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

20

Page 21: An Asynchronous Routing Algorithm for Clos Networks

Asynchronous Dispatching (AD)

• IM alg.

– IPs request

– LIs select

– OK? Send data

– Fail? Go back

• CM alg.

– CMs grant LIs

– Update status

feedback

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

21

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 22: An Asynchronous Routing Algorithm for Clos Networks

Asynchronous Dispatching (AD)

• IM alg.

– IPs request

– LIs select

– OK? Send data

– Fail? Go back

• CM alg.

– CMs grant LIs

– Update status

feedback

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

22

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 23: An Asynchronous Routing Algorithm for Clos Networks

Asynchronous Dispatching (AD)

• IM alg.

– IPs request

– LIs select

– OK? Send data

– Fail? Go back

• CM alg.

– CMs grant LIs

– Update status

feedback

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

23

IM(0)

CM(0)

IM(1)

OM(0)

OM(1)

CM(1)

CM(2)

0

2

1

3

Page 24: An Asynchronous Routing Algorithm for Clos Networks

Comparison of Algorithms

• Random Dispatching (AD)

– Cannot resolve contention in IMs or CMs.

• Concurrent Round-Robin Dispatching (CRRD)– Handle contention in IMs using iterations.

– Cannot resolve contention in CMs.

• Asynchronous Dispatching (AD)– Contention in IMs is handled by asynchronous

arbiter naturally.

– Handle contention in CMs using status feedback.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

24

Page 25: An Asynchronous Routing Algorithm for Clos Networks

Scheduler Architecture

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

25

OM_Sch

(0)

IM_Sch(i)

IM_Sch(7)

CM_Sch(r)

CM_Sch(3)

OM_Sch

(7)

OM_Sch

(j)

RIM

IM_Dis

InpG0

InpG1

InpG2

InpG3

OMr

IMr

OM

r

IMr

RCM

CM_Dis

CM_Sch(0)IM_Sch(0)

req(0,0)

req(0,1)

req(0,2)

req(0,3)

req(i,0)

req(i,1)

req(i,2)

req(i,3)

req(7,0)

req(7,1)

req(7,2)

req(7,3)

C(4,8,4)

Page 26: An Asynchronous Routing Algorithm for Clos Networks

CM Dispatcher/OM Scheduler

CMa: CM ack

CMs: CM status feedback

CMa and CMs are

mutually exclusive

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

26

Tre

e A

rbite

r

CMr( i, j)

CMr( 0, j)

CMr( 7, j)

CMa( i, j)

CMa( 0, j)

CMa( 7, j)

CMa( i', j)

i' = 0 ~ 7, i' ≠ i

CMa( i, j')

j' = 0 ~ 7, j' ≠ j

CMs( i, j)

C(4,8,4)

OM_Sch

(0)

IM_Sch(i)

IM_Sch(7)

CM_Sch(r)

CM_Sch(3)

OM_Sch

(7)

OM_Sch

(j)

RIM

IM_Dis

InpG0

InpG1

InpG2

InpG3

OMr

IMr

OM

r

IMr

RCM

CM_Dis

CM_Sch(0)IM_Sch(0)

req(0,0)

req(0,1)

req(0,2)

req(0,3)

req(i,0)

req(i,1)

req(i,2)

req(i,3)

req(7,0)

req(7,1)

req(7,2)

req(7,3)

Page 27: An Asynchronous Routing Algorithm for Clos Networks

IM Dispatcher

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

27

IM/CM

Match

ArbiterIM/CM

Match

Arbiter

LI Arbiter

LI Arbiter

IMcfg

IMr(0)

IMr(1)

IMr(2)

IMr(3)

ImCmA(7)

ImCmA(0)

CMs(3)

CMs(0)

CMr(3)

CMr(0)

IMa

8

4*44

8

8

CfgG

CMa(0)

CMa(3)8

C(4,8,4)OM_Sch

(0)

IM_Sch(i)

IM_Sch(7)

CM_Sch(r)

CM_Sch(3)

OM_Sch

(7)

OM_Sch

(j)

RIM

IM_Dis

InpG0

InpG1

InpG2

InpG3

OMr

IMr

OM

r

IMr

RCM

CM_Dis

CM_Sch(0)IM_Sch(0)

req(0,0)

req(0,1)

req(0,2)

req(0,3)

req(i,0)

req(i,1)

req(i,2)

req(i,3)

req(7,0)

req(7,1)

req(7,2)

req(7,3)

Page 28: An Asynchronous Routing Algorithm for Clos Networks

IM/CM Match Arbiter(Multi-resource Arbiter)

• S. Golubcovs, D. Shang, F. Xia, A. Mokhov, and A. Yakovlev, “Modular approach to

multi-resource arbiter design,” ASYNC 2009.

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

28

Tree Arbiter

Tre

e A

rbite

r

ri cb

ocb

o

ri

rbo

rbi

rbi

rbocbi

cb

o

riri cb

o

rbi

rbocbi

rbo

rbi

cbi

cbi

ci

ci ci

ci

IMr(0, j)

IMr(3, j)

IMrb(0, j)

IMrb(3, j)

CMs(0, j)

CMsb(0, j)

CMs(3, j)

CMsb(3, j)

h

h( j,3

,0)

h

h( j,0

,0)

hh( j,0,3)

hh( j,3,3)

C(4,8,4)

Page 29: An Asynchronous Routing Algorithm for Clos Networks

Request Generation

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

29

h( j, h, r)

IMr( h, j)

CMs( r, j)

CMrMx( r, j, h)

CMrMx( r, j, 0)

CMrMx( r, j, 3)

CMrME( r, j)

LI A

rbite

rCMrME( r, 0)

CMrME( r, 7)

CMr( r, 0)

CMr( r, j)

CMr( r, 7)

CMsb( r, j)

CMrMx(0, j, h)

CMrMx(3, j, h)

IMrb( h, j)

C(4,8,4)

Page 30: An Asynchronous Routing Algorithm for Clos Networks

Configuration Generation

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

30

CMa( r, j)

CMrMx( r, j, h)

CMr( r, j)

cfgMx( r, h, j)

cfgMx( r, h, 0)

cfgMx( r, h, 7)

IMcfg( r, h)

IMa(h)

IMcfg(0, h)

IMcfg(3, h)

C(4,8,4)

Page 31: An Asynchronous Routing Algorithm for Clos Networks

Area and Speed

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

31

Dispatching:setup 4.6 ns release 2.9 ns

OM Scheduler:setup 1.6 ns release 1.0 ns

Total: setup 6.2 ns release 3.9 ns

C(4,8,4)

Page 32: An Asynchronous Routing Algorithm for Clos Networks

MATLAB Simulation Model

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

32

Page 33: An Asynchronous Routing Algorithm for Clos Networks

Throughput Analysis

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

33

C(4,8,4) non-blocking loadRD 55%CRRD (i=4) 70%AD (i=3) 76%

C(4,8,m) non-blocking loadRD (m=7) 74%CRRD (m=7) 82%AD (m=7) 96%

Page 34: An Asynchronous Routing Algorithm for Clos Networks

Conclusions

• The first asynchronous routing algorithm for general 3-stage Clos networks.

• Better throughput performance than RD and CRRD

• Future works

–Optimize the Clos structure

–Reduce area and latency

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

34

Page 35: An Asynchronous Routing Algorithm for Clos Networks

Thank you!

Questions?

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

35

Page 36: An Asynchronous Routing Algorithm for Clos Networks

Iterations

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

36

Page 37: An Asynchronous Routing Algorithm for Clos Networks

Uniform Traffic

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

37

Page 38: An Asynchronous Routing Algorithm for Clos Networks

Results after Optimization

• Async

– Setup 3.7 ns

– Reset 3.2 ns

– Period ~ 7 ns (140M)

– 28K Gates

• Sync

– 350 MHz

– Cell time = Iter+1 =5 (70M)

– 22K Gates

• Optimization

– Replace multi-resource arbiter

– Eager request (InpG)

– MUTEX arbiter

– Optimized tree arbiter

June 23rd 2010Advanced Processor Technologies Group

The School of Computer Science

38