routing architecture and algorithms for a superconductivity circuits-based computing hardware

15
ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue, Kazuaki Murakami Kyushu University, Japan CCECE 2011

Upload: lev-dillon

Post on 02-Jan-2016

26 views

Category:

Documents


1 download

DESCRIPTION

Routing Architecture and Algorithms for a superconductivity circuits-based Computing Hardware. Farhad Mehdipour , Hiroaki Honda, Hiroshi Kataoka , Koji Inoue, Kazuaki Murakami Kyushu University, Japan. - PowerPoint PPT Presentation

TRANSCRIPT

CCECE 2011

ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE

Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue, Kazuaki Murakami

Kyushu University, Japan

2

CREST-JST (2006~): Low-power,high-performance, reconfigurable processor using

single-flux quantum (SFQ) circuits

SFQ-LSRDP

K. MurakamiK. InoueH. Honda

F. MehdipourH. Kataoka

Kyushu Univ.Architecture, Compiler

and Applications

S. Nagasawa et al.

Superconducting Research Lab. (SRL)

SFQ process

N. Yoshikawa et al.

Yokohama National Univ.SFQ-FPU chip, cell library

A. Fujimaki et al.

Nagoya Univ.SFQ-RDP chip, cell library,

and wiring

N. Takagi (Leader) et al.

Nagoya Univ.CAD for logic design

and arithmetic circuits

Our mission: Architecture, compiler and application development

Outline of Large-Scale Reconfigurable Data-Path (LSRDP) Processor

ジョセフソン接合

超伝導ループ

磁束量子Single Flux QuantumSuperconductivityloop

Josephson junctionジョセフソン接合

超伝導ループ

磁束量子

ジョセフソン接合

超伝導ループ

磁束量子

ジョセフソン接合

超伝導ループ

磁束量子Single Flux QuantumSuperconductivityloop

Josephson junction

3

SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing

…Buffers

Buffers

LSRDP

Memory

inst;inst;…conf_LSRDP ( ); Loop: rearrange_input_data ( ); set_IO_info ( ); run_LSRDP ( ); inst; … sync_lsrdp ( ); rearrange_output_data ( );End_Loopinst;…

instinstconf_LSRDP();

conf. bit-stream …

rearrange_input_data ()

GPP

Memory Controller

set_IO_info ( );

Memory Controller

run_LSRDP ( ); inst sync_lsrdp ( );

GPPGPP

Waiting for the LSRDP LSRDP terminating

the operation

rearrange_output_data ( )

GPP

How it works

4

Architecture Exploration

Layout-I

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

...

...

...

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

ADD/SUBMUL

...

.

.

.

.

.

.

.

.

.

ADD/SUBMUL

ADD/SUBMUL

ORN

ORN

ORN

.

.

.

Layout-II

ADD/SUB

MUL

ADD/SUB

MULADD/SUB

MUL

ADD/SUB

MULADD/SUB

MUL

...

...

...

ADD/SUB

MULADD/SUB

MUL ...

.

.

.

.

.

.

.

.

.

MULADD/SUB

ORN

ORN

ORN

.

.

.

Layout-III

MUL MUL

ADD/SUB

ADD/SUB

ADD/SUB

ADD/SUB

MUL MUL MUL MUL

...

...

...

ADD/SUB

ADD/SUB

ADD/SUB

ADD/SUB

...

.

.

.

.

.

.

.

.

.

MUL MUL

ORN

ORN

ORN

.

.

.

FU TUTU

PE arch. I

4-inps/3-outs

FU TU

PE arch. II

3-inps/3-outs

TU TU FU TU

Basic PE arch.

3-inps/2-outs

MCL= 1

Num

ber

of

row

s =

1.5

×M

Number of columns = 4×MCL

Num

ber

of

row

s =

2

×M

Number of columns = 6×MCL+2MCL= 1

Num

ber

of

row

s =

1.5

×M

Number of columns = 4×MCL+1

MCL= 2

LSRDP Layouts

PE structures

ORN structures

5

6

LSRDP Tool Chain

ApplicationC code

1 Modified application code

2

Modifying application code

Inserting LSRDP instructions in the code

1

ISAcc or COINS compiler

2

DFG Extraction

1

.asm codefor MIPS-based GPP

2

Data flow graphsPlacing and Routing Tool

2

Configuration file +various text & schematic

reports

1

LSRDP library fileFunction definitions

& declarations1

LSRDP architecture description

2

1: flow of the assembly code generation for GPP

2: flow of configuration bit-stream generation for the LSRDP

SimulatorPerformance evaluation

Mapping DFGs onto LSRDP

7

Longest connections

DFG

LSRDP Architecture Description

Placing Input Nodes

Placing Operational & Output Nodes

Routing Nets

Routing IO Nets

Final Map

Global routing algorithms

src

dest

src

dest

vacantfully- occupied

exhaustive search-basedvery time consuming

branch and bound alg.Very fast

Routing DFG connections between source and destination PEs

8

Micro-Routing-Problem Definition

• Inputs– LSRDP basic specifications

•Layout, Width (W), MCL, PE arch., and etc.•List of connections b/w consecutive rows

– ORN structure including•The number of CBs and T2s in each row•The number of CB rows•Topology of connections among CBs

• Output– Detailed routes via cross-bar switches

•The list of CBs used for routing each connection•Configuration of CBs

FU T FU T FU T FU T…

FU T FU T FU T FU T…

ORN

i-th row

(i+1)-th row

A micro-routing algorithm has been implemented for the LSRDP with underlying layout II and PE arch. III

ORN Micro-routing

00 01 10 11

00 01 10 11

CB

½CB

(PE1 PE 5)

(PE2 PE5, PE6, PE7)

(PE3 PE6, PE8 )

(PE4 PE7, PE8)1/2CB: 1-input/2-ouput

CB: 2-input/2-output

Micro-nets

Example

10

PE1

PE 2

PE 3

PE 5

PE 6

PE 7

PE 4 PE 8

½CB

½CB

½CB

½CB

CB

CB

CB

(CB)

(CB)

CB

CB

CB

CB

3

2

4

2

2

3

4

1

1

22

2

43

3

4

3

4

3

2

2

4

1

-

1817

12

20

18

25

24

24

3231

PEs in 3rd Row

PEs in 4th row4

5

6

7

8

9

10

11

ORN Micro-Routing Example: Heat 8x2- ORN b/w 3rd and 4th Rows

9

10

11

12

13

14

16

18

8

17

6

15

7

9

10

11

12

13

14

16

18

8

17

6

15

7

9

10

11

12

13

14

16

18

8

17

6

15

7

9

10

11

12

13

14

16

18

8

17

6

15

7

9

10

11

12

13

14

16

18

8

17

6

15

7

9

10

11

12

13

14

16

18

8

17

6

15

712

17

24

20

25

18

3132

18

24

12

18

20

24

18

17

32

25

24

31

12

18

2524

24

31

18

32

17

20

12

18

18

24

24

3132

25

17

20

9

10

11

12

13

14

16

18

8

17

6

15

7

12

18

20

24

24

31

32

17

18

25

12

1818

20

24

31

17

32

2425

12

18

24

25

32

9

10

11

12

13

14

16

18

8

17

6

15

7

17

20

31

12

18

20

24

3132

25

17

9

10

11

12

13

14

16

18

8

17

6

15

7

12

20

24

31

17

32

18

25

18

12

17

20

24

3132

25

9

10

11

12

13

14

16

18

8

17

6

15

7

64

5

6

7

8

9

10

11

CCECE 2011 12

Specifications of Attempted DFGs

total # of nodes # of Inputs # of outputs # of ops

Heat-8x1 34 6 4 16

Heat-8x2 60 8 4 32

Heat-16x2 172 16 12 96

Poisson-3x3 62 18 1 33

Vibration-4x2 48 8 4 24

Vibration-8x2 136 16 12 72

Vibration-8x4 168 16 8 96

ERI-1 76 16 9 51

ERI-2 67 19 1 47

CCECE 2011 13

Example of a DFG MappingVibration- 8x2

CCECE 2011 14

Results of routing nets using the proposed algorithms

DFG avg. hor. C.L. avg./max.ver. C.L.

# of global/micro nets to route

Timeto map (sec)

Heat-8x10.35 0.75/3 36/64 0.015

Heat-8x2 0.44 1.32/5 68/114 1.75

Heat-16x2 0.47 1.64/7 204/343 1.05

Poisson-3x3 0.68 2.4/16 67/120 2074.5

Vibration-4x2 0.46 1.58/9 50/88 0.34

Vibration-8x2 0.42 2.15/10 154/332 2.20

Vibration-8x4 2.48 3.72/16 348/610 6721.3

ERI-1 0.75 2.21/9 111/374 53.61

ERI-20.78 2.99/9 95/332 0.327

Thank you for your attention!