automated generation of round-robin arbitration and ... · automated generation of round-robin...

63
Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney III School of Electrical and Computer Engineering, Georgia Institute of Technology

Upload: others

Post on 24-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

Automated Generation of Round-robin Arbitration and Crossbar Switch Logic

Eung S. ShinAdvisor: Professor Vincent J Mooney III

School of Electrical and Computer Engineering, Georgia Institute of Technology

Page 2: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 2

Overview

DSP

on-chip network&

arbiter perip

hera

lco

re

DSPGPPGPP

cust

omlo

gic

mem

ory

mod

ule

mem

ory

mod

ule

mem

ory

mod

ule

mem

ory

mod

ule

Crossbar (Xbar)

round-robin arbiter

Multiprocessor System-on-a Chip (SoC)

Page 3: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 3

Arbiter Problems

A fast and powerful arbiter for an SoCA fast arbiter for terabit switching speedsA tedious and error-prone task

Network Switch (16x16)

CrossbarSwitch Fabric

(16x16)x16

16 (16x16 arbiter)s

… … …

VOQ(0,16)

VOQ(0,0)

.

.

.input port 0

VOQ(16,0)

.

.

.

VOQ(16,16)

input port 16

.

.

.

.

.

.

output port 16

output port 0

...

.

.

.

...

req(0, 0)

req(16, 16)

grant(0, 0-16) grant(16, 0-16)

Page 4: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 4

Xbar Problems

Multiple communication channels demanded in a multiprocessor SoCChallenge: reducing productivity gapProductivity gap reduction techniques:

Enhancing IP core reusabilityDeveloping a CAD tool

Page 5: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 5

ObjectiveTo design and automate a fast round-robin arbiter logic generation for a bus or a network switch

The generated arbiter employed to crossbar (Xbar) switch arbitration logic

To automate Xbar generation providing multiple communication paths among masters

The generated Xbar customized according to user specifications

Page 6: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 6

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 7: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 7

Terminology

MxN Switch: M-input by N-output switchExample: A 32x32 switch − 32-input by 32-output switch with 1024 (322) possible connections between input ports and output ports

Virtual Output Queues (VOQs): to remove possible output port contention (Head of Line (HOL) blocking)VOQ (m, n): m − the input port index; n − the output port index

Example: VOQ (1, 0)

Network Switch (32x32)

CrossbarSwitch Fabric

(32x32)x32

32 (32x32 arbiter)s

… … …

VOQ(0,31)

VOQ(0,0)

.

.

.input port 0

VOQ(31,0)

.

.

.

VOQ(31,31)input port 31

.

.

.

.

.

.

output port 31

output port 0

...

.

.

.

...

req(0, 0)

req(31, 31)

grant(0, 0-31) grant(31, 0-31)

Page 8: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 8

Terminology (Continued)(MxV)xN Switch:

M − the number of input ports of an MxN switchV − the number of VOQs per input portN − the number of output ports of an MxN switchTypically, V = NThe total number of VOQs in an MxN switch − M∗N

Network Switch (32x32)

CrossbarSwitch Fabric

(32x32)x32

32 (32x32 arbiter)s

… … …

VOQ(0,31)

VOQ(0,0)

.

.

.input port 0

VOQ(31,0)

.

.

.

VOQ(31,31)

input port 31

.

.

.

.

.

.

output port 31

output port 0

...

.

.

.

...

req(0, 0)

req(31, 31)

grant(0, 0-31) grant(31, 0-31)

Page 9: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 9

Terminology (Continued)(MxV)xN crossbar switch fabric:

Connections between (MxV) inputs and N outputs

MxM Switch Arbiter (SA):

Controlling M specific transmission gates between M VOQs and a particular output port N MxM SAs in an MxN switch

Thirty-two 32x32 SAs

32 x 32 SA_0

. . .

gran

t (0,

0)

gran

t (1,

0)

gran

t (31

, 0)

VOQ (0, 0)

VOQ (1, 0)

VOQ (31, 0)

.

.

.

.

.

.

VOQ (0, 0)

VOQ (1, 0)

VOQ (31, 0)

.

.

.

.

.

.

32 x 32 SA_31

. . .

gran

t (0,

31)

gran

t (1,

31)

gran

t (31

, 31)

VOQ (0, 31)

VOQ (1, 31)

VOQ (31, 31)

.

.

.

.

.

.

(32x32)x32 Crossbar Switch Fabric

output port 0

output port 31

. . .

.

.

.

.

.

.

Page 10: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 10

Terminology (Continued)MxM distributed SA (MxM hierarchical SA):

Equivalent to an MxM SAConsisting of smaller switch arbiter in the form of a hierarchical tree structure

Bus Arbiter (BA): resolving bus conflicts

PriorityLogic 0

req[0]req[1]req[2]req[3]

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

EN

EN

EN

EN

grant[0]

grant[1]

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

in[0]in[1]in[2]in[3]

D-FFD-FFclock

PriorityLogic 0

req[0]req[1]req[2]req[3]

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

EN

EN

EN

EN

grant[0]

grant[1]

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

in[0]in[1]in[2]in[3]

D-FFD-FFclock

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

Page 11: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 11

Requirements for a Terabit Switch ArbiterStarvation freeFast ArbitrationSimplicity to implementLow power:

Power budget of single rack router ~ 10kW

Page 12: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 12

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 13: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 13

History: Arbiter in PPECentralized Switch Arbiters:

Programmable Priority Encoder (PPE) implementing iterative round-robin algorithm (iSLIP)

P. Gupta and N. Mckeown, “Designing and Implementing a Fast Crossbar Scheduler,” IEEE Micro, 1999, pp. 20-28. N. Mckeown, P. Varaiya, and J. Warland, “The iSLIP Scheduling Algorithm for Input-Queued Switch,”IEEE Transaction on Networks, 1999, pp. 188-201.

tothermo

P_enc

log2 n

Req

n

Priority EncoderPriority Encoder_thermo

n new_Req

n

nn

any_Gnt_PE_thermo

n

Gnt_PE_thermo

Gnt_PE

n

Gnt

P_thermo

Page 14: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 14

History: Arbiter in PPADistributed Switch Arbiter:

Ping Pong Arbiter (PPA)H. J. Chao, C. H. Lam, and X. Guo, “A Fast Arbitration Scheme for Terabit Packet Switches,” Proceedings of IEEE Global Telecommunications Conference, 1999, pp. 1236-1243.

Comparison: our generated SA 2.3X faster than PPE and 1.8X faster than PPA

1613 14 15129 10 1185 6 741 2 31 2

external grant signals

layer 1

layer 2

layer 3

layer 4

root PPA intermediate PPA leaf PPA

r0r1Fi

Gg0

Gg1

g0 g1 Fo

2x2PPA

Q D

Clock

r0

r1

g0

g1

Fi Fo

Gg0 Gg1

Page 15: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 15

Why do we need an arbiter for an SoC?

Arbitration required by all busesOur arbiter applicable to anywhere requiring arbitrationThe generated arbiter utilized in our Xbar

Page 16: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 16

History: Crossbar Switch

Smart memory:Reconfigurable crossbar between bus masters (2 integer clusters and 1 floating point cluster) and slaves (SRAMs)

K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” Proceedings of International Symposium on Computer Architecture (ISCA), June 2000, pp. 161-171.

Page 17: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 17

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 18: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 18

Bus Arbiter DesignImplemented based on ring counter for a token and “priority logic”Priority Logic for 4 inputs:

output[0] = EN•in[0]output[1] = EN•in[0]'•in[1]output[2] = EN•in[0]'•in[1]'•in[2]output[3] = EN•in[0]'•in[1]'•in[2]'•in[3]

000000001

100010001

0100X1001

0010XX101

0001XXX11

0000XXXX0

output [3]output [2]output [1]output [0]in [3]in [2]in [1]in [0]EN

Page 19: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 19

Previous Bus Arbiter Approach

FIFO ArbiterRound-robin arbiter with fixed prioritiesRound-robin arbiter with priority rotation

M-inputPriority Logic

M-request M-grant M-grantM-inputPriority Encoder

M-requestro

tatio

n

deco

der

Page 20: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 20

Example: Our Bus Arbiter

Condition:Token=4’b0100 → Processor 2 with the highest priorityProcessors 0 and 1 requesting a bus

Result:Only Priority Logic 2 enabledProcessor 0 granted due to negated request signals of the higher priority parties (processor 2 and processor 3) Token rotated to 4’b1000 after the ring counter receiving ack signal

Processor 2

req[0]

Processor 0 Processor 1 Processor 3

Memory

PL 2

token[2]

ring counter

grant[0]

4x4 BA

req[1]output[2]

ack

Page 21: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 21

Example: Our Bus Arbiter (Continued)

PriorityLogic 0

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

grant[2]

grant[3]

4x4 BA

ack

reset

output[0]output[1]output[2]output[3]

req[0]req[1]req[2]req[3]

EN

EN

EN

in[0]in[1]in[2]in[3]

D-FFclock

grant[0]

grant[1]

grant[0]

PriorityLogic 2

toke

n [2

]to

ken

[2]

PriorityLogic 2

EN

Page 22: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 22

Hierarchical SA DesignA hierarchical SA consisting of small switch arbiter blocksSix types of switch arbiter blocks:

2x2 ack-req SA3x3 ack-req SA4x4 ack-req SA2x2 root SA3x3 root SA4x4 root SA

A root SA: placed on the top of a hierarchy

Page 23: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 23

Switch Arbiter Blocks

2x2Bus

Arbiter

ack

req0[0]req0[1]

grant0[1]

grant0[2]

req0

2x2 ack-reqSAclock

reset3x3Bus

Arbiter

ack grant0[0]grant0[1]

grant0[2]req0[0]req0[1]req0[2]

req0

3x3 ack-reqSAclockreset 4x4

BusArbiter

ack grant0[0]grant0[1]

grant0[2]grant0[3]req0[3]

req0[0]req0[1]req0[2]

req0

4x4 ack-reqSAclockreset

2x2BA

without D flip-flop

ack0

ack1

req0

req1

2x2 root SAclock

ring counterresettoken[1:0] 3x3

BAwithout

D flip-flop

ack0ack1

ack2

req0req1req2

3x3 root SAclock ring counterreset

4x4BA

without D flip-flop

ack0ack1ack2ack3

req0req1req2req3

4x4 root SAclock ring counterreset

Page 24: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

Example: 32x32

hierarch-ical SA

4x4ack-req

SAl0.sa0

req0[0]req0[1]req0[2]req0[3]

ack0[0]

4x4ack-req

SAl0.sa1

ack0[1]

4x4ack-req

SAl0.sa2

ack0[3]

4x4ack-req

SAl0.sa3

4x4ack-req

SAl1.sa0

req1[0]req1[1]req1[2]req1[3]

req2[0]req2[1]req2[2]req2[3]

req3[0]req3[1]req3[2]req3[3]

ack0[2]

4x4ack-req

SAl0.sa4

req4[0]req4[1]req4[2]req4[3]

ack1[0]

4x4ack-req

SAl0.sa5

ack1[1]

4x4ack-req

SAl0.sa6

ack1[3]

4x4ack-req

SAl0.sa7

4x4ack-req

SAl1.sa1

req5[0]req5[1]req5[2]req5[3]

req6[0]req6[1]req6[2]req6[3]

req7[0]req7[1]req7[2]req7[3]

ack1[2]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant7[0]grant7[1]grant7[2]grant7[3]

up_req[0]up_req[1]

req0

req1

req2 req3

req4req5

req6req7

up_ack0

up_ack1

clock

token = 2’b10

token = 4’b0010

token = 4’b0010

Page 25: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

0111

1100

1111

1101

0011

1111

0011

1111

4x4ack-req

SAl0.sa0

req0[0]req0[1]req0[2]req0[3]

ack0[0]

4x4ack-req

SAl0.sa1

ack0[1]

4x4ack-req

SAl0.sa2

ack0[3]

4x4ack-req

SAl0.sa3

4x4ack-req

SAl1.sa0

req1[0]req1[1]req1[2]req1[3]

req2[0]req2[1]req2[2]req2[3]

req3[0]req3[1]req3[2]req3[3]

ack0[2]

4x4ack-req

SAl0.sa4

req4[0]req4[1]req4[2]req4[3]

ack1[0]

4x4ack-req

SAl0.sa5

ack1[1]

4x4ack-req

SAl0.sa6

ack1[3]

4x4ack-req

SAl0.sa7

4x4ack-req

SAl1.sa1

req5[0]req5[1]req5[2]req5[3]

req6[0]req6[1]req6[2]req6[3]

req7[0]req7[1]req7[2]req7[3]

ack1[2]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant7[0]grant7[1]grant7[2]grant7[3]

up_req[0]up_req[1]

req0

req1

req2 req3

req4req5

req6req7

up_ack0

up_ack1

clock

up_req[1]

req5[1]

4x4ack-req

SAl0.sa5

req5

grant5[1]

Example: 32x32

hierarch-ical SA

2x2rootSA

4x4ack-req

SAl1.sa1

up_ack1

ack1[1]

ack1[1]

D

D

up_ack1

req5[1]

up_req[1]

req5

2x2rootSA

grant5[1]

up_ack1

req5[1]

4x4ack-req

SAl0.sa5

req5

ack1[1]

up_req[1]

2x2rootSA

up_ack1

grant5[1]

ack1[1]

grant5[1]

clock

2x2rootSA

up_req[0]

up_req[1]

req5[0]

req5[1]req5[2]

req5[3]

4x4BA

counterD

req4req5req6req7

4x4BA

counterD

up_ack0

up_ack1

l1.sa1.output[1]

l0.sa5.output[1]

32 x 32 SA Critical Path:•Travels through two 4-input OR gates in series•Then through a 2x2 root SA•Finally through two 2-input AND gates in series•Results in 0.94ns delay using a TSMC 0.25µ standard cell library from LEDA Systems.

ack1[1]

D

D

up_ack1

req5[1]

up_req[1]

req5

2x2rootSA

grant5[1]

up_ack1

ack signals look like feedback path through the same logic block. In fact there is no input to the same logic gates.

Page 26: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 26

Hierarchical BAExtra ‘ack’ input: indicating the completion of a single use of the bus Root token rotation:

By an ‘ack’ signal for a hierarchical BABy clock signal for a hierarchical SA

Possession of a bus for multiple cycles

req0[0]

req0[1]

req0[2]

req0[3]

ack0[0]

ack0[1]

req1[0]

req1[1]

req1[2]

req1[3]

2x2rootSA

grant0[0]

grant0[1]

grant0[2]

grant0[3]

grant1[0]

grant1[1]

grant1[2]

grant1[3]

req0req1

8x8 hierarchical BA

clock

4x4ack-req

BA 0

counterD

4x4ack-req

BA 1

counterD

ack

counterD

D

4x4Bus

Arbiter

D

D

D

ackclock

req0[3]

req0[1]

req0[2]

req0[0]

reset

4x4 ack-req BA

grant0[3]

grant0[1]

grant0[2]

grant0[0]

req0

D latches triggered the positive edge of ack

Page 27: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 27

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 28: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 28

Comparisons with PPE and PPA

Using TSMC 0.25µ std. cell library from LEDA SystemsSynthesized by Design Compiler from Synopsys

0

1000

2000

3000

4000

5000

6000

0 50 100 150

MxM arbiter

Are

a of

arb

iter i

n th

e nu

mbe

r of

inve

rter

eq

uiva

lent

s

SAPPEPPA

0

0.5

1

1.5

2

2.5

3

3.5

0 50 100 150

MxM arbiter

Del

ay in

arb

iter w

ith T

SMC

.25u

m

SAPPEPPA

1.81X1.85X

1.94X

2.31X 2.03X

2.38X

Page 29: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 29

Comparisons (Continued)

The shortest delay results fromLimiting the size of switch arbiter blocks to 2x2, 3x3 and 4x4 to reduce the critical path delayDue to the expansion of priority logic blocks compared with PPEReducing the number of levels in a hierarchy by preferring to use more 4x4 switch arbiter blocks compared with PPA

Page 30: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 30

Key Insights

Our hierarchical SA vs. PPE:

Much larger logic equations of PPEMore constrained to a specific structure in our SAA distributed “state” kept by token

16x16 Hierarchical SA

4x44x4

4x44x4

4x416 requests

4

4

4

4

16 grants

16x16 PPE

16-inputPE16 requests 16 grants

No token

token

token

Page 31: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 31

Key Insights (Continued)

Our hierarchical SA vs. PPA:

PPA with token passing approach.MxM PPA implemented by 2-input basic arbiterOur hierarchical SA preferring utilizes 4-input switch arbiter blocks.

16 grants

16x16 Hierarchical SA

4x44x4

4x44x4

4x416 requests

4

4

4

4

16 grants

16x16 PPA

2x2

2x22x2

2x2

2x22x2

2x2

2x2

2x22x2

2x2

2x22x2

2x2

2x216 requests

2

2x2PPA

Q D

Clock

r0

r1

g0

g1

Fi Fo

Gg0 Gg1

Page 32: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 32

Power Comparison

For dynamic power estimation, a Switching Activity Information File (SAIF) is extracted from Synopsys VCS.Assume congested network: all input requests are asserted.

RTL VerilogRTL Simulation

using of Synopsys VCS

Power Compiler

Back AnnotationSAIF file

PowerEstimation

TechnologyLibrary

Design Compiler

Page 33: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 33

Power Comparison

Using TSMC 0.25µ std. cell library from LEDA SystemsEstimated by Power Compiler from Synopsys

Static Power Dissipation with TSMC 0.25um

0

2000

4000

6000

8000

10000

0 20 40 60 80 100 120 140

MxM

pW

SAPPAPPE

Dynamic Power Dissipation with TSMC 0.25um

05

101520253035

0 50 100 150

MxMm

W

SAPPAPPE

Page 34: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 34

Power ComparisonTotal Power Dissipation

05

101520253035

0 50 100 150

Number of Requests

mW

SAPPAPPE

Page 35: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 35

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 36: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

RAGA user specification: an arbiter typeA user specification: the number of masters (M) to be arbitratedOutput of RAG: a synthesizable Verilog code for a Bus Arbiter or a Switch Arbiter at the RTL

User input:1. Type of the arbiter:3. Number of masters

Library2x2 ack-req SA3x3 ack-req SA4x4 ack-req SA2x2 ack-req BA3x3 ack-req BA4x4 ack-req BA2x2 root SA3x3 root SA4x4 root SA

integrate M x Mhierarchical switch arbiterinteg_arb();

integrate M x Mhierarchical bus arbiterinteg_bus_arb();

calculate the number oflevels and the number ofbasic arbiter blocks foreach levelInterpret_sa();

SABA

BA SA

number-of_levels ← 0;dividend←number_of_masters;remainder← (dividend modulo 4);

dividend=0?

No

dividend ←floor(dividend/4);number_of_levels ++;

dividend=1 and remainder=0 ?

dividend←number_of_masters;n← 0;

Yes

dividend>3?

remainder ← dividend modulo 4;dividend← (dividend/4);4by4_in_level[n]← dividend;

Yes

dividend modulo 3 =0 ?

No Yes

Yes

dividend modulo 4=0?

YesNo

remainder ← dividend modulo 3;dividend← (dividend/3);3by_in_level[n]← dividend;

remainder ← dividend modulo 4;dividend← floor (dividend/4);4by_in_level[n]← dividend;

No

remainder=3 ?

2by2_in_level[n]++; Yes

3by3_in_level[n]++;

No

dividend>2?

No

dividend← (dividend/3);3by3_in_level[n]← dividend;

Yes

dividend← floor (dividend/2);2by2-in_level[n]← dividend;

No

dividend = 4by4_in_level[n] + 3by3_in_level[n] + 2by2_in_level[n] ;

n = number_of_levels-1 ?

Yes

No

Generate Hierarchical Arbiter

Yes

Page 37: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

num_level ← 0dividend←num_mastersremainder← 0

dividend=0?

No

dividend ←(integer) (dividend/4)n ←num_levelnum_level ++

dividend=0 and remainder=0 ?

dividend←num_mastersn← 0

Yes

dividend>2?

remainder ← dividend mod 4dividend←(integer) (dividend/4)num_4by4_level[n]← dividend

remainder ← dividend mod 2dividend←(integer) (dividend/2)num_2by2_level[n]← dividend

No Yes

remainder=0 ?

remainder>2 ?

No

num_4by4_level[n]++num_2by2_level[n]++

YesNo

n++

remainder=0 ?

No

Yes

No Yes

dividend←num_4by4_level[n]+num_2by2_level[n]n<num_level?

No

Yes Hierarchical SA

dividend=32

dividend=32n=0

Yes

RAG (Continued)dividend=8n=0num_4by4_level(0)=0num_2by2_level(0)=0num_level=1

dividend=2n=1num_4by4_level(1)=0num_2by2_level(1)=0num_level=2

dividend=0n=2num_4by4_level(2)=0num_2by2_level(2)=0num_level=3

remainder=0dividend=8num_4by4(0)=8

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SAn=1

dividend=8

remainder=0dividend=2num_4by4(1)=2

4x4

SA

4x4

SA

n=2

dividend=2

remainder=0dividend=1num_2by2(2)=12x2

root

SA

n=3

dividend=1

Page 38: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 38

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

RAG (Continued)

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

4x4ack-req

SAl0.sa0

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]

4x4ack-req

SAl0.sa1

ack0[1]

4x4ack-req

SAl0.sa2

ack0[3]

4x4ack-req

SAl0.sa3

4x4ack-req

SAl1.sa0

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

req2[0]req2[1]req2[2]req2[3]

req2[0]req2[1]req2[2]req2[3]

req3[0]req3[1]req3[2]req3[3]

req3[0]req3[1]req3[2]req3[3]

ack0[2]

4x4ack-req

SAl0.sa4

req4[0]req4[1]req4[2]req4[3]

req4[0]req4[1]req4[2]req4[3]

ack1[0]

4x4ack-req

SAl0.sa5

ack1[1]

4x4ack-req

SAl0.sa6

ack1[3]

4x4ack-req

SAl0.sa7

4x4ack-req

SAl1.sa1

req5[0]req5[1]req5[2]req5[3]

req5[0]req5[1]req5[2]req5[3]

req6[0]req6[1]req6[2]req6[3]

req6[0]req6[1]req6[2]req6[3]

req7[0]req7[1]req7[2]req7[3]

req7[0]req7[1]req7[2]req7[3]

ack1[2]

2x2root

SA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant7[0]grant7[1]grant7[2]grant7[3]

grant7[0]grant7[1]grant7[2]grant7[3]

up_req[0]up_req[1]

req0

req1

req2 req3

req4req5

req6req7

up_ack0

up_ack1

clock

User input:1. Type of the arbiter:3. Number of masters

Library2x2 ack-req SA3x3 ack-req SA4x4 ack-req SA2x2 ack-req BA3x3 ack-req BA4x4 ack-req BA2x2 root SA3x3 root SA4x4 root SA

integrate M x Mhierarchical switch arbiterinteg_arb();

integrate M x Mhierarchical bus arbiterinteg_bus_arb();

calculate the number oflevels and the number ofbasic arbiter blocks foreach levelInterpret_sa();

SABA

BA SA

integrate M x Mhierarchical switch arbiterinteg_arb();

Page 39: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 39

SpeedupThroughput comparison for 64-bit switching 32x32 network switch

The longest delay of 32x32 switch = 0.63ns in .25µTSMC → the maximum throughput of switch determined by arbitration delay

Experimental setup:Replacing SA in 32x32 network switch with our hierarchical SA, PPE and PPA

VO

Q C

ontro

llers

32 32x32 SAs

64-bit 32x32 Switch Fabric

322

64-b

it V

OQ

s

The floorplan of the 64-bit 32x32 switch fabric, VOQs, controllers and SAs

Area: 125.64mm2

Page 40: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 40

Speedups (Continued)Speedup in .25µ TSMC

Our hierarchical 32x32 SA: 64 [email protected] delay →2.18Tbps32x32 PPA: [email protected] delay → 1.20Tbps32x32 PPE: [email protected] delay → 0.94TbpsResults: The throughput achieved by our SA > 1.8X than PPA and > 2.3X than PPE

Network Switch (32x32)

CrossbarSwitch Fabric

(32x32)x32

32 (32x32 arbiter)s

… … …

VOQ(0,31)

VOQ(0,0)

.

.

.input port 0

VOQ(31,0)

.

.

.

VOQ(31,31)

input port 31

.

.

.

.

.

.

output port 31

output port 0

...

.

.

.

...

req(0, 0)

req(31, 31)

grant(0, 0-31) grant(31, 0-31)

Page 41: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 41

Perfectly Fair?

PriorityLogic 0

req[0]req[1]req[2]req[3]

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

EN

EN

EN

EN

grant[0]

grant[1]

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

in[0]in[1]in[2]in[3]

D-FFclock

EN

0110

# of grants

req #

11

10

# of grants

req #

PriorityLogic 0

EN

EN

PriorityLogic 1

ENEN

PriorityLogic 2

EN

11

20

# of grants

req #

PriorityLogic 3

EN11

30

# of grants

req #How about PPE and PPA?

Page 42: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 42

Fairness SimulationJoint work with Dr. Riley and Tom ChengSimulation with the 32x32 hierarchical SASeveral input patterns applied for uniform traffic simulationBursty and TCP traffics applied Traffic intensity, ρ: a measure of the demand on the switch for bursty and TCP traffics.

Page 43: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 43

Simulation ResultsUniform traffic: metrics are the number of grants

002499993

25000002500002

2500002500002500001

4999997490002500000

Grants,0, 1, 2 asserted

Grants,0 , 1 asserted

Grant, all asserted4x4 ack-req input

Run 3Run 2Run 1

Page 44: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 44

Simulation Results (Continued)

Bursty traffic: metrics are average delay

1.102.004.6316.4431.113

1.011.824.6117.1430.952

1.101.934.6015.5931.231

1.051.984.4616.6531.430

Delay,ρ = 0.1

Delay,ρ = 0.5

Delay,ρ = 0.9

Delay,ρ = 1.0

Delay,ρ = 2.0

4x4 ack-req input

Page 45: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 45

Simulation Results (Continued)

TCP traffic: metrics are average delay

1.101.94 12.13 17.54 19.22 3

1.10 1.93 10.69 19.71 20.44 2

1.10 1.95 11.2919.48 19.91 1

1.10 1.95 11.68 16.75 18.12 0

Delay,ρ = 0.1

Delay,ρ = 0.5

Delay,ρ = 0.9

Delay,ρ = 1.0

Delay,ρ = 2.0

4x4 ack-req input

Page 46: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 46

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 47: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 47

Why Xbar and X-Gt?

Xbar:Demand on multiple communication channelsConcurrent accesses to resources

X-Gt:Generation customized Xbar on-the-flyThe generated Xbar in RTL Verilog

Page 48: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 48

Xbar SwitchOne configuration for 4x4 caseEach 4x1 switches: Comparing physical addresses from PEs and judging if addresses belong to the address space of the attached memory blockMaximum of 4 concurrent memory access in one cycle

PE3

PE2

PE1

PE0

4x4 Xbar

4x1switch

34x1sw

itch2

mem0

mem1

mem3

mem2

4x1switch

04x1sw

itch1

Page 49: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 49

Xbar (Continued)An MxN switch consisting of N Mx1 switches, M = # of PEs & N = # of memory blocksAn assertion of ‘mem_req’ inside a Mx1 switch: depending on an ‘address’ input from a PEGrant: given by an arbiter by the assertion of ‘mem_on’ signalsArbitration in round-robin order: handling M requests from M processors

Page 50: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 50

4x1 Switch ExampleScenario:

Request from PE 0 and PE 3PE 3 granted

pe_req[0]

pe_req[3]

arbiter

comp...

pe_addr0

pe_addr3

addr bus

switch

mem_on[0]

mem_on[3]

mem_data

mem_req[3]

...

...

.

.

.

...

pe_data0

pe_data3

databus

switch

.

.

.

pe_re0

pe_re3

wireswitch

.

.

.

...

pe_we0

pe_we3

wireswitch

.

.

.

...

pe_ta0

pe_ta3

wire_taswitch

.

.

.

...

mem_we mem_ta

mem_req[0]

mem_addr mem_re

Page 51: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 51

Example of Xbar Configuration

4x1switch

74x1sw

itch4

mem0

mem1

mem2

mem3

mem7

mem6

mem5

mem4

4x8 Xbar:Supporting 4 PEs and 8 memory modulesConnecting a particular PE signals to a specific memory module by a 4x1 switch according to a physical address from a PE

4x1switch

04x1sw

itch1

4x1switch

24x1sw

itch3

4x1switch

64x1sw

itch5

Page 52: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 52

X-Gt (Xbar Generator)

Generation of customized Xbar in Verilog at the RTL

An arbiter generated by RAGParameterizable switch blocks in an Mx1 switchAll submodules connected by wire names

Page 53: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 53

X-Gt (Continued)Parameters:

The number of PEs that determines M in an MxN XbarThe number of memory blocks that determines N in an MxN XbarThe total global memory size that determines (pe_)address bus widthThe data bus width of each PE determined by PE typeThe (mem_)address bus width determined by the size of memory block

Page 54: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 54

parameters

m<M

pe_req[m]...

m++

gen_proc_wire(M)

n<N

mem_addrn...

n++

m<N

gen_mem_wire(M)

gen_addr_bus_switch(M)gen_data_bus_switch(M)

gen_wire_switch(M)gen_wre_ta_switch(M)

yes

yes

RAG generatingan arbiter

m++

gen_comp(M)

yesgen_Mx1(parameters)

MxN Xbar

Flowchart of X-Gt

Page 55: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 55

parameters

m<M

pe_req[m]...

m++

gen_proc_wire(M)

n<N

mem_addrn...

n++

m<N

gen_mem_wire(M)

gen_addr_bus_switch(M)gen_data_bus_switch(M)

gen_wire_switch(M)gen_wre_ta_switch(M)

yes

yes

RAG generatingan arbiter

n++

gen_comp(M)

yes

gen_Mx1(parameters)

4x4 Xbar

pe_req[0]pe_addr0pe_data0pe_read0pe_write0pe_ta0

m=0

M=4, N=4

pe_req[1]pe_addr1pe_data1pe_read1pe_write1pe_ta1

m=1pe_req[2]pe_addr2pe_data2pe_read2pe_write2pe_ta2

m=2pe_req[3]pe_addr3pe_data3pe_read3pe_write3pe_ta3

m=3

mem_addr0mem_data0mem_read0mem_write0mem_ta0

n=0mem_addr1mem_data1mem_read1mem_write1mem_ta1

n=1mem_addr2mem_data2mem_read2mem_write2mem_ta2

n=2mem_addr3mem_data3mem_read3mem_write3mem_ta3

n=3

4x1switch

0

m=1

4x1switch

1

m=2

4x1switch

2

m=3

4x1switch

3

m=4

4x1switch

34x1sw

itch2

4x1switch

04x1sw

itch1

Verilog in RTL

Page 56: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 56

OutlineTerminologyOrigin and history of problems:

Arbiter design: PPE and PPACrossbar switch design: “Smart” Memory

Arbiter designArbiter experimentsRAG: Round-robin Arbiter GeneratorX-Gt: Xbar GeneratorXbar experimentsConclusion

Page 57: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 57

Xbar Synthesis: Mx1 Gate Area

Using TSMC 0.25µ std. cell library from Artisan ComponentsEstimated by Design Compiler from Synopsys

0

1000

2000

3000

4000

5000

6000

2 3 4 5 6 7 8 9

Number of processors

Mx1

sw

itch

area

in th

e nu

mbe

r of

INVE

RTE

R e

quiv

alen

ts w

ith T

SMC

.2

5um

Page 58: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 58

Xbar Synthesis: MxM Area

Using TSMC 0.25µ std. cell library from Artisan ComponentsGate Area estimated by Design Compiler from SynopsysGate + Wire Area estimated by Silicon Ensemble from Cadence

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10Number of processors and number of

memory blocks

MxM

Xba

r are

a in

squ

are

mm

with

TS

MC

.25u

m

Gate AreaGate + Wire Area

Page 59: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 59

Back Annotated Timing An RTL Verilog (output of X-Gt) synthesized by DC with TSMC .25µfrom Artisan ComponentsBack annotation of SDF and set_load files to logic synthesis stage after place and route using Synopsys Design Compiler and Cadence Silicon Ensemble

Logic Synthesis

Place & Route

ExtractRC values

Design Compiler

Silicon Ensemble

report timing

RTL VerilogX-Gt

Back annotate SDF and set load files

Page 60: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 60

MxM Xbar Delay

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9

Number of processors and number of memory blocks

MxM

Xba

r del

ay in

ns

with

TSM

C

.25u

mw/o back annotation w/ back annotation

Page 61: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 61

ConclusionShowed how we design hierarchical SA and hierarchical BA.Demonstrated ≥ 1.8X speedups for our hierarchical SA over PPE and PPA.Illustrated how RAG generates a hierarchical SA.Demonstrated Xbar switch and showed how X-Gt generates an Xbar according to user specifications.Illustrated gap between delays from logic synthesis stage and delays from physical synthesis stage for MxM Xbars.

Page 62: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 62

PublicationsAs (co-) First Author:

K. Ryu, E. S. Shin, and V. Mooney, “A Comparison of Five DifferentMultiprocessor SoC Bus Architectures,” Proceedings of the 2001 Euromicro Symposium on Digital Systems Design (DSD’01), September 2001. E. S. Shin, V. Mooney, and G. Riley, “Round-robin Arbiter Design and Generation,” Proceedings of 15th International Symposium on System Synthesis (ISSS’02), October 2002. E. S. Shin, V. Mooney, and G. Riley, “Round-robin Arbiter Design and Generation,” Georgia Institute of Technology Technical Report, GIT-CC-02-38, Available HTTP: http://www.cc.gatech.edu/tech_reports/index.02.html M. Shalan, E. S. Shin and V. Mooney, “DX-Gt: Memory Management and Crossbar Switch Generator for Multriprocessor System-on-a-Chip,”Workshop on Synthesis And System Integration of Mixed Information technologies (SASIMI’03), April 2003. E. S. Shin, V. Mooney, and G. Riley, “Round-robin Arbiter Design and Generation,” submitted to IEEE Transactions on CAD.

Page 63: Automated Generation of Round-robin Arbitration and ... · Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney

2/12/2004 63

Publications (Continued)

Other publications:P. Cheng, Eung S. Shin, G. R. Riley and V. J. Mooney III, “SASim: Switch Arbiter Simulator,” Georgia Institute of Technology Technical Report, GIT-CC-03-38, Available HTTP: http://www.cc.gatech.edu/tech_reports/index.03.html.A. Talpasanu , J. A. Davis, E. S. Shin and V. J. Mooney, “Crossbar Switch Interconnect Delay Calculation,” Georgia Institute of Technology Technical Report, GIT-CC-03-37, Available HTTP: http://www.cc.gatech.edu/tech_reports/index.03.html.

Provisional Patent:E. S. Shin and V. Mooney, “Fast Distributed Switch Arbiter Design and Generation.”