towards a vertically integrated synthesis flow for ... · 0,8 enoc onoc (conservative) onoc...

23
Towards Towards a Vertically Integrated Synthesis Flow for Predictable Design of Wavelength-Routed Optical NoCs Davide Bertozzi MPSoC Research Group Acknowledgement: Luca Ramini, Marta Ortin, Anja Boos, Marco Balboni, Sandro Bartolini, Paolo Grani

Upload: others

Post on 12-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

TowardsTowards a

Vertically Integrated Synthesis Flow

for Predictable Design

of Wavelength-Routed Optical NoCs

Davide BertozziDavide Bertozzi

MPSoC Research Group

Acknowledgement:

Luca Ramini, Marta Ortin, Anja Boos, Marco Balboni, Sandro Bartolini, Paolo Grani

Page 2: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Optics into Smaller-Scale SystemsOptics has made a long way from long-haul telecommunication

networks to data centers and multi-chip systems.

Multi-chip

Silicon chip ???

• The early days of ONoCs remind me of the early days of electrical

interconnection networks!

Datacenter

Krishnamoorthy, Optoelectronics Letters, May 2006

Page 3: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Behind the scene....• Commercial exploitation of NoCs started from contradicting numberscontradicting numbers

Frequency (MHz)

Netlist FloorplanPost-P&R

drop

AMBA Multilayer 480 400 16.7%

NoC/21 bits 910 885 2.7%

NoC/38 bits 910 885 2.7%

Higher clock speed, higher predictability

Area (mm2)Overall

FloorplanFabric +

slack

AMBA Multilayer 35 5

NoC/21 bits 45 15

NoC/38 bits 45 15

Higher area though!Higher clock speed, higher predictability

Bandwidth (GB/s) Overall bandwidth

AMBA Multilayer 26.5

NoC/21 bits 100

NoC/38 bits 180

Higher bandwidth, sometimes better latency!

Power (mW) SequentialCombination

alOverall Seq. ratio

AMBA Multilayer 6 66 72 18.5%

NoC/21 bits 296 81 377 78.5%

NoC/38 bits 416 85 501 83.0%

Higher area though!

Higher power consumption though!

Energy (mJ)Benchmark run time

Fabric only 1W system 5W system

AMBA Multilayer 1 ms 0.072 1.07 5.60

NoC/21 bits 0.9 ms 0.339 1.34 5.32

NoC/38 bits 0.85 ms 0.426 1.37 5.13

Page 4: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Behind the scene....• Commercial exploitation of NoCs started from contradicting numberscontradicting numbers

Not everything should be

“better” to bring a new

technology on the stage

The performance speedup property was key.

System energy savings were a byproduct.

Page 5: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

The Boosting Factor• Initially, the NoC IP portfolio was the “business card” of NoC vendors

• Very soon it became clear that the real business card was the

availability of toolflows to bring designers’ productivity to a new level

NoCNoC vendors then started to deliver what designers actually needed:vendors then started to deliver what designers actually needed:

�� Fast and automated design space exploration Fast and automated design space exploration

�� FloorplanningFloorplanning constraints in the early design stages for faster and quicker convergenceconstraints in the early design stages for faster and quicker convergence

�� not just IP models, but also technology modelsnot just IP models, but also technology models

�� NoCNoC customization was the main goalcustomization was the main goal

� IP portfolio: of course you should have it!!

Page 6: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

The ONOC Business Card

� The Electronic Tile-based CMP

Architecture burns on average

15 Watts on target applicative

loads

Normalized System Energy

1

1,2

Interconnect Energy

(aggressive optical technology)

This won’t This won’t

be enough

for

industrial

prime time!

What about

the

boosting

factor?

ONoC makes the system more energy efficient, although the interconnect itself does not

achieve the energy break-even point with the electrical NoC

0

0,2

0,4

0,6

0,8

1

ENoC ONoC (Conservative) ONoC (Aggressive)

3 bit

4 bit

21% 24%

The trick: on average ONoC outperforms ENoC by about 18% @ 3bit and 23% @ 4bit.

factor?

Page 7: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Pathfinding

There is currently a huge gap between

Technology Developers & System Level Designers

I have a great ..Oh Cool,

I have a great

device; it works!

..Oh Cool,

but I can’t do

design with it!

Are we ready to bridge this gap?

� Descriptive information at different abstraction layers are mixed and

hardwired in the same design description.

Logically speaking:

It separates same-source multi-wavelength signals

and recombines separated components into multi-

wavelength and multi-source signals

(L1Si,....LnSi) (L1Si,L2Sj, L3Sk,....)

Physically speaking:Physically speaking:

When we change P&R constraints,

its layout becomes crossing-

dominated

Page 8: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Pathfinding

There is currently a huge gap between

Technology Developers & System Level Designers

I have a great ..Oh Cool,

I have a great

device; it works!

..Oh Cool,

but I can’t do

design with it!

Are we ready to bridge this gap?

� Descriptive information at different abstraction layers are mixed and

hardwired in the same design description.

Logically speaking:

It separates same-source multi-wavelength signals

and recombines separated components into multi-

wavelength and multi-source signals

(L1Si,....LnSi) (L1Si,L2Sj, L3Sk,....)

Physically speaking:Physically speaking:

When we change P&R constraints,

its layout becomes crossing-

dominated

Logically speaking:Logically speaking:

This topology does the same

Question:

Its interIts inter--switch crossings switch crossings

should be viewed from a should be viewed from a

logical (irrelevant) or logical (irrelevant) or

physical viewpoint?physical viewpoint?

?

Page 9: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Pathfinding

There is currently a huge gap between

Technology Developers & System Level Designers

I have a great

device; it works!

..Oh Cool,

but I can’t do

design with it!

Are we ready to bridge this gap?

� Designs are difficult to compare with one another

Claim: “GWOR uses a lower number of MRRs than lambda-router (e.g., in a 4x4 ONoC)Yes, since GWOR does not support selfYes, since GWOR does not support self--communication!communication!

Page 10: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Pathfinding

There is currently a huge gap between

Technology Developers & System Level Designers

I have a great

device; it works!

..Oh Cool,

but I can’t do

design with it!

Are we ready to bridge this gap?

� The application of well-known interconnection network techniques is more difficult.

=

� Flow control should be

end-to-end and topology-

independent�Receiver buffer size should be

topology-specific, since it should

cover the round-trip latency for

max. throughput operation

Page 11: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Design Methodology and Synthesis Flow

Only a cross-layer design methodology and automated synthesis toolflow can

accelerate or even determine the evolution of the ONoC concept into an

industry-relevant and viable interconnect technology

-Specification of abstraction layers for ONoC design.

Learning from the past:

�� Fast and automated design space exploration Fast and automated design space exploration

�� FloorplanningFloorplanning constraints in the early design stages for faster and quicker convergenceconstraints in the early design stages for faster and quicker convergence

�� not just IP models, but also technology modelsnot just IP models, but also technology models

�� NoCNoC customization should be the main goal (highcustomization should be the main goal (high--end embedded computing)end embedded computing)

� IP portfolio: of course you should have it!!

Page 12: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Early signs of a top-down

design methodology• Let us keep the focus on Wavelength-Routed Optical Networks-on-Chip

• Contention-free and performance-guaranteed communication

I1 T1

T2

T3

λ1

λ2

λ3

λ4

λ3

λ4 A

B

A

B

How to synthesize the topology?

T3

T4

λ4

I2 λ2

λ1C

D

C

D

Separation of chromatic signals:

each wavelength component

should be separated to reach a

different destination

(1)A

(2,3,4)A

(1,2,3,4)A1

(1,2,3,4)A I1

Key abstract operator: add-drop filter

Page 13: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

A

B

(1)A

(2,3,4)A

(1,2,3,4)A

(2)A

(3,4)A (3)A

1

2

3

Wavelength Separation Graph

I

N

I

TARGET#A

ASSUMPTIONS• 4 INITIATORS AND 4 TARGETS (i.e., A,B,C D).

• INITIATORS USE THE SAME 4 WAVELENGTHS (i.e., 1,2,3,4).

• UTILIZATION OF 1X2 LOGICAL FILTERING OPERATORS

Path #1

Path #2

Path #3

Path #4

TARGET#B TARGET#C

C

D

(4)A

3(1)B

(2,3,4)B

(1,2,3,4)B

(3)B

(2,4)B (4)B

(2)B

1

3

4(1)C

(2,3,4)C

(1,2,3,4)C

(3)C

(2,4)C(4)C

(2)C

1

3

4(1)D

(2,3,4)D

(1,2,3,4)D

(2)D

(3,4)D (3)D

(4)D

1

2

3

I

T

I

A

T

O

R

S

Overall, 12 1x2 drop filters are needed

to realize 16 contention-free optical paths!!!

TARGET#D

Page 14: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Covering the Separation Graph

A

B

(1)A

(2,3,4)A

(1,2,3,4)A

(2)A

(3,4)A (3)A

(4)A

1

2

3(1)B

(2,3,4)B

(1,2,3,4)B

(3)B

(2,4)B (4)B

1

3

Let us “cover” the wavelength graph with higher-order logic filters (e.g., 2x2)

in order to obtain logic topologies

λ-ROUTER topologyGraph Covering

C

D

(2,4)B (4)B

(2)B

4(1)C

(2,3,4)C

(1,2,3,4)C

(3)C

(2,4)C(4)C

(2)C

1

3

4(1)D

(2,3,4)D

(1,2,3,4)D

(2)D

(3,4)D (3)D

(4)D

1

2

3

vv

There are precise covering rules for the functional correctness of the topologyThere are precise covering rules for the functional correctness of the topology

(e.g., never recombine split signals; never mix wavelength-homogeneous signals)

All known WRONoC topologies can be materialized this way!All known WRONoC topologies can be materialized this way!

What about unknown topologies?

Page 15: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

New Topology: Equalized Lambda-Router

2

3

2 2 2 2

3

2 2 2 2

3 3

2 2 2

1

0

1 1 1 1

0

1 1 1 1

0 0

1 1 1

0

0,5

1

1,5

2

2,5

3

3,5

A/A

A/B

A/C

A/D

B/A

B/B

B/C

B/D

C/A

C/B

C/C

C/D

D/A

D/B

D/C

D/D

drops

crossings

2

3

2

3

1

2

3

2 2

3

2

1

3

2

3

2

1

1

1

0

1

1

0

1 1

0

1

1

0

1

1

1

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

A/A A/B A/C A/D B/A B/B B/C B/D C/A C/B C/C C/D D/A D/B D/C D/D

drops

crossings

lambda_Router

Assumption:

do not count inter-switch crossings

-- No drops on the critical path, No drops on the critical path,

-- more balanced optical pathsmore balanced optical paths

Page 16: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

New Topology: GWOR With Self-Communication

3

1

2

3 3

2

3

1 1

3 3

2 2

3

1

3

1

1

1

0 0

1

1

1 1

1

0

1 1

0

1

1

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

A/A

A/B

A/C

A/D

B/A

B/B

B/C

B/D

C/A

C/B

C/C

C/D

D/A

D/B

D/C

D/D

drops

crossings

2

3

2

3

1

2

3

2 2

3

2

1

3

2

3

2

1

1

1

0

1

1

0

1 1

0

1

1

0

1

1

1

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

A/A A/B A/C A/D B/A B/B B/C B/D C/A C/B C/C C/D D/A D/B D/C D/D

drops

crossings

lambda_Router

-- This covering increased the number This covering increased the number

of overly short and overly long optical of overly short and overly long optical

pathspaths

Page 17: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

New Topology: Random

3

1

3 3 3 3

1

4

1

0

2

3

1

3 3

2

0

1

1 1 1

0

1

1

1

1

1

0

1

1

0

1

0

1

2

3

4

5

6

A/A

A/B

A/C

A/D

B/A

B/B

B/C

B/D

C/A

C/B

C/C

C/D

D/A

D/B

D/C

D/D

drops

crossings

2

3

2

3

1

2

3

2 2

3

2

1

3

2

3

2

1

1

1

0

1

1

0

1 1

0

1

1

0

1

1

1

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

A/A A/B A/C A/D B/A B/B B/C B/D C/A C/B C/C C/D D/A D/B D/C D/D

drops

crossings

lambda_Router

-- Extreme path differentitationExtreme path differentitation

Page 18: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

� First case assumptions:

� Floorplan area: 8mmx8mm

� Hub size: 1mmx1mm

Wavelength Separation Graph� Second case assumptions:

� Floorplan area: 2.95mmx2.95mm

� Hub size: 1mmx1mm

13,66

16

� The Proton P&R tool for ONoCs is used to obtain the Physical layout and the maximum insertion loss.

� Proton can be instructed to pursue different primary design goals (or a mix thereof):

� Minimize_propagation_loss.

� Minimize_crossing_loss.

25

Max_insertion_loss

9,58

11,2210,311

13,66

8,197,099

7,76

5,2346

0

2

4

6

8

10

12

14

Balanced Lambda

Router

Gener.

GWOR

Random

min_prop_loss

min_cross_loss

dB

15,5817,31

9,8

19,63

7,33 6,65

9 8,98

0

5

10

15

20

Balanced Lambda

Router

Gener.

GWOR

Random

min_prop_loss

min_cross_lossdB

The random topology is the

most power-efficient one

The lambda router is the

most power-efficient topology

Max_insertion_loss

We start to have a design space to explore here! The common design abstraction is there!

The pruning method to some extent depends on th P&R algorithm!

Page 19: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Lesson learned

Design

space

exploration

of logic

topologies

DSE Logic

Topoloigies

Pre-

placement

Logic Topology Logic Topology

Connectivity

requirements

Connectivity

requirements

Searching for

more design predictability?

Placement

and

Routing

Compute-

Intensive

Placement

and Routing

Logic Topology

Physical Topology

Logic Topology

Physical Topology

more design predictability?

Page 20: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Augmenting the Flow

DSE Logic

Topoloigies

Pre-

placement

Logic Topology

Connectivity

requirements

Specification of

communication

Requirements

per flow

(bandwidth,

latency)

I1 T1Λ1.1, Λ1.2, Λ1.3

Λ2.1

Customization and/or hybridization

Compute-

Intensive

Placement

and Routing

Logic Topology

Physical Topology

T2

T3

T4

Λ2.1

I2

To Arbitrated ONoC

(e.g., MWSR)

To Electronic

NoC

Page 21: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Augmenting the Flow

DSE Logic

Topoloigies

Pre-

placement

Logic Topology

Comm. requirements

Design iterations may be motivated by early-phase analysis

of metrics pertinent to the physical layout

There is no “clean” (inverse) correlation between

maximum insertion loss and worst OSNR in a topology

The critical path with respect to

insertion loss is not

the path with the worst OSNR

Automation of the flow will help designers

capture subtle effects

Compute-

Intensive

Placement

and Routing

Logic Topology

Physical Topology

Page 22: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Augmenting the Flow

DSE Logic

Topoloigies

Pre-

placement

Logic Topology

Comm. requirements

Design iterations may be motivated by early-phase analysis

of metrics pertinent to the physical layout

There is no “clean” (inverse) correlation between

maximum insertion loss and worst OSNR in a topology

Resonator-less

crossings

The critical path with respect to

insertion loss is not

the path with the worst OSNR

Compute-

Intensive

Placement

and Routing

Logic Topology

Physical Topology

OSNR analysis should be performed on the layout as well,

which requires extensive CAD tool support

Not all crossings are

noise contributors

Page 23: Towards a Vertically Integrated Synthesis Flow for ... · 0,8 ENoC ONoC (Conservative) ONoC (Aggressive) 3 bit 4 bit 21% 24% The trick: on average ONoC outperforms ENoC by about 18%

Conclusions

• Optical NoCs have been demonstrated to enable system-level performance speedups and energy savings in academia.

• However, it is the availability of design methodologies and synthesis toolflows that makes the real difference when it comes to industrial exploitation.

• Clearly identifying abstraction layers in ONoC design is the ideal stepping stone to kick-off this process. Nonetheless, cross-layer stepping stone to kick-off this process. Nonetheless, cross-layer optimizations are fundamental for predictable design.

• Customization should again drive ONoC design, especially in the embedded computing domain

• In a sense, the history of electronic NoCs is repeating itself. However, there will be a fundamental difference: the cross-layer integration issue of optics with electronics.

• It’s time to start bridging the EDA gap!