physical implementation of the dspin network-on-chip in

Post on 12-Feb-2022

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Physical Implementation of the DSPIN Network-on-Chip in the

FAUST Architecture

Ivan Miro-Panades1,2,3, Fabien Clermidy3, Pascal Vivet3, Alain Greiner1

1 The University of Pierre et Marie Curie, Paris, France2 STMicroelectronics, Crolles, France3 CEA-Leti, MINATEC, Grenoble, France

Ivan MIRO PANADES – NOCS 20081

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 20082

Motivation

Physically implement the DSPIN NoC into the FAUST application platform

- DSPIN is a NoC developed between Lip6 and ST

- FAUST is a stream-oriented application platform for 4G telecom applications, based on ANOC, developed by CEA-Leti.

Compare the performances between ANOC and DSPIN on a real application and traffic

Ivan MIRO PANADES – NOCS 20083

FAUST architecture

RAM IF58 Pads

ETHERNET IF17 Pads

Async/Sync IF

Async node

NOC2 IF

83 Pads

RX units

TX units

AHB units

OFDMMOD.

ALAM.MOD.

CDMAMOD. MAPP. BIT

INTER.TURBOCODER

RAM ARM946 RAMEXT.

RAMCTRL

AHB

ROTOR EQUAL. CHAN.EST.

CONV.DEC.

ETHERNET

FRAMESYNC.

ODFMDEM.

CDMADEM.

DE-MAPP.

DE-INTER.

EXP

SPort

APort

NOC1 IF

84 PadsSPort

APort

RAC

NoCPerf.

EXP

CONV.CODER

Clk & Reset CTRLJTAG Clk, Rst

DART

23 computation units

Asynchronous NoC(ANOC)

20 ANOC routers

GALS conception

24 independent Clks

Ethernet port

Internal/External RAM

CPU ARM946ES

Cache 4KB-I 4KB-D

Hardware OFDM modulation/demodulation

Ivan MIRO PANADES – NOCS 20084

ANOC architectureAsynchronous NoC

- Asynchronous send/accept handshake protocol

- QDI 4-phase/4-rail asynchronous logic

- QoS with two Virtual Channels (Best Effort, Guaranteed Service)

NoC Routers- 5 ports router- Source routing- Wormhole packet switching- 32 bits payload

FIFO based GALS interfacesMapped onto libraries

- CMOS ST 130nm- TIMA TAL130 library

CrossbarCrossbar

CrossbarCrossbar

NICGALS

interfac

eIP

Asynchronous handshake protocol

Ivan MIRO PANADES – NOCS 20085

FAUST floor-plan with ANOC 4.5 M Gates276 chip pinsChip area 80 mm²ST CMOS 130nm166 MHz (worst-case)

NoC implementation :

DART

RAC

ARM946

RAM1RAM2 Ext.

RAM Ctrl.

OFDM mod.

Rotor

Frame sync.

OFDM demod.

CDMA Dem.

Channel Est. Ethernet

Deinter.Demapp.

Conv. Dec.Equal.

Mapp.CDMA Mod.N

P1

NP2

Ala.Bit.

Inter.Turbo Dec.

Conv. Codec.

Exp.

Exp.

CLK

Res

et

- Uses ANOC router hard-macro- Perform buffering and routing

of ANOC links- No spaghetti routing at top

level !

ANOC router (0.211 mm²)

WestPort

EastPort

NorthPort Unit Port

South Port

Ivan MIRO PANADES – NOCS 20086

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 20087

Ivan MIRO PANADES – NOCS 20088

Cluster (Y,X)Cluster (Y,X)

Cluster (YCluster (Y--1,X)1,X)

(Y,X+1)(Y,X+1)(Y,X(Y,X--1)1)

(Y(Y--1,X1,X--1)1) (Y(Y--1,X+1)1,X+1)

Cluster (Y+1,X)Cluster (Y+1,X)(Y+1,X(Y+1,X--1)1) (Y+1,X+1)(Y+1,X+1)

Local Local portport

in

in

in

in

in

in

in

in

in

DSPIN architecture

CK0CK0CK1CK1

CK4CK4CK5CK5

CK3CK3

CK6CK6 CK7CK7 CK8CK8

CKCK22

Packet baseDistributed router architectureSuited to GALS approachMesochronous links between routersSynthesizable with standard cellsNeither asynchronous nor custom cellsMetastability resolved by “bi-synchronous FIFO” More details in:

"A Low Cost Network-on-Chip with Guaranteed Service Well Suited to the GALS Approach“ NanoNet’06

NoC architectures comparisonANOC DSPIN

Router arity 5 port router 5 port router

Topology Irregular 2D mesh Regular 2D mesh

Routing technique Source routing (9 hops) Address-based (8 bits) X-First algorithm

Switching technique Wormhole Wormhole

Flit size (payload) 34 bits (32 bits) Generic: 34 bits (32 bits)

Flow control bits on the flit Begin of packet (BOP)End of packet (EOP)

Begin of packet (BOP)End of packet (EOP)

Virtual channels Best effortGuaranteed service

Best effortGuaranteed service

Programming model Message passing Shared memory (2 routers per cluster)Message passing (1 router per cluster)

Clocking scheme Fully asynchronous (QDI) with GALS interfaces

Multi-synchronous with mesochronousinterfaces

Flow control protocol Send/accept asynchronous handshake FIFO protocol (Write and WriteOk)

Clock tree None One per router

Physical implementation Hard macro Soft macro distributed on five modules

Long wires Inter-router wires Intra-cluster wires

Ivan MIRO PANADES – NOCS 20089

Packet format

Y X

4 bits

H0H1H2...H8

2 bits 4 bits2 bits

34 bits18 bits

34 bits (generic)

First flit

Following flits

2 bits

8 bits

DSPIN packetANOC packet

Similar packet format and control bitsANOC uses Source-routing (18 bits) allowing 9 hopsDSPIN uses Address-based (8 bits)Packet conversion module required:

- Design of Protocol_conversion module

Ivan MIRO PANADES – NOCS 200810

FAUST integration

GALS interface

SynchronousSEND/ACCEPT

AsynchronousSEND/ACCEPT

IPNIC

ANOC router

AsynchronousSEND/ACCEPT

CLK_IP

ANOC IP template

Protocol_conversion

SynchronousREAD/WRITE

SynchronousSEND/ACCEPT

IPNIC

LUT

DSPINrouter

MesochronousREAD/WRITE

CLK_NoC

CLK_IP

DSPIN IP template

Protocol_conversion module:Translates the routing algorithm using a LUTAdapts the flow control signals:

- ANOC: Send/Accept - DSPIN: FIFO protocol

Implements two bi-synchronous FIFOs

ANOC GALS Interface:Implements 4 FIFOssynchronous↔asynchronous

FAUST IPs and NICs are not modified

Ivan MIRO PANADES – NOCS 200811

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 200812

DSPIN implementation

Ivan MIRO PANADES – NOCS 200813

SynthesisHierarchical synthesis

Low Power CMOS ST 130nm technology

Standard cells

Without asynchronous nor custom cellsFloorplanning

Place

Optimize placement

Timing constraints file:

- Muti-cycle path (mesochronous interfaces)

- False path (asynchronous interfaces)

Clock-tree

RouteGALS compatible

Clock gating

Implemented in 4 stepsOptimize

DSPIN clock-treeMesochronous links

GALS compatibleBi-synchronous FIFO [NOCS 2007]

Max skew 50% clock periodTiming constraints:set_multi_cycle_path

Clk

Clk’Clk Clk’

Ivan MIRO PANADES – NOCS 200814

Router (0,0) Router (0,1) Router (0,2)

180° phase shift and 30% skew between routers5% skew

within the router

2nd, 3th Step5% skew

( bottom tree)

1st Step

4th Step30% skew(top tree)

Clk_NoC

Clock-tree implementation1. Add buffers/inverters2. Built bottom clock tree

(5% skew)3. Characterize bottom

clock-tree4. Build top clock-tree

(30% skew)

FAUST floor-plan with DSPINDistributed router implementationSoft macro approachHigher floor-plan flexibilityNoC adapts to the SoCLong wires are routed in a tree mannerDifferent router configurations are possible

WE

S

L N L

ES

W W

N L

S

E

N

WLS

E

N

WL S E

N

LW S E

N

W E

E

N

W

LS E

N

WL S

E

N

WL S E

L

N

W S

E

N

W

L

E

S

N

WL

S E WL

N

S E

N

EW S

L

N

WS

L ES W E

L

N

W S L E

N

W S

L E

N

S

WL

E

N

N

LRAC OFDM mod.

CDMA Mod.NP1

Ala. Bit. Inter.

Turbo Dec.

Conv. Codec.

CLK

ARM946

RAM1

RAM2Ext. RAM Ctrl.

Rotor

Channel Est. EthernetConv. Dec.

Equal.

Frame sync. OFDM demod.

CDMA Dem. Deinter.Demapp.

DART

N

W

LS E

DSPIN routerPlacement density: 60-70 %

(reserved area)

Ivan MIRO PANADES – NOCS 200815

FAUST floor-planM

app.

NP

2

NP

1

NP

2

Exp

.

Exp.

CLK

Res

et

FAUST with ANOC FAUST with DSPIN

Ivan MIRO PANADES – NOCS 200816

Outline

MotivationFAUST architectureMigration of DSPIN into FAUSTDSPIN implementationNetworks comparison

Ivan MIRO PANADES – NOCS 200817

NoC Area

ANOC DSPINRouter 0.211 mm² 0.161 mm²Interface GALS 0.070 mm² 0.024 mm²Clock tree 0.000 mm² 0.0016 mm²Total 0.281 mm² 0.187 mm²CMOS 130nm

ANOC is implemented as a hard macroDSPIN is implemented as a soft macroDSPIN is 33% smaller than ANOC

Ivan MIRO PANADES – NOCS 200818

NoC Throughput

ANOC DSPIN

Throughput on worst-caseconditions ~ 160Mflit/s ≤ 289Mflit/s

Throughput on nominalconditions ~ 220Mflit/s ≤ 408Mflit/s

DSPIN throughput is deterministic with respect to the clock frequency (one flit per clock cycle)Long wire latency penalty on throughput:

- DSPIN: critical path crosses one time the long wires- ANOC: critical path crosses 4 times the long wires, 4-phase protocol

• ANOC link pipelining is feasible

In a commercial circuit, DSPIN will be clocked not far away fromworst-case (289 MHz) to improve the fabrication yield

Ivan MIRO PANADES – NOCS 200819

Packet latency (1)

Compute the packet latency through many routersDSPIN has deterministicpacket latency with respect to clock frequencyDSPIN has lower First and Last packet latenciesANOC intermediate latencyis lower than DSPIN. DSPIN resynchronize the data on each router

ANOC DSPIN

F = 150 MHz

Intermediate router latency 6.80 ns 16.66 ns

First + Last latency 60.00 ns 56.66 ns

ANOC DSPIN

F = 250 MHz

6.80 ns 10.00 ns

47.00 ns 34.00 ns

Ivan MIRO PANADES – NOCS 200820

Packet latency (2)

ANOC DSPIN ANOC DSPIN

F = 150 MHz F = 250 MHz

Latency for 5 hops path 80.00 ns 106.66 ns 68.00 ns 64.00 ns

Latency for 9 hops path 106.66 ns 173.30 ns 96.00 ns 104.00 ns

The packet latency are similar for clock frequencies >250 MHzIntermediate packet latency is important but the application should be mapped in order to optimize the data locality (try to communicate with neighbor IPs rather than with faraway IPs)

Ivan MIRO PANADES – NOCS 200821

Power consumption

DSPIN ANOC

F = 150 MHz F = 250 MHz

Router 2.07 mW 2.89 mW 4.85 mW

GALS interface 1.62 mW 0.56 mW 0.81 mW

Clock tree 0.00 mW 2.44 mW 4.73 mW

Total 3.69 mW 5.89 mW 10.39 mW

Power extraction after P&RFunctional packet traffic (OFDM demodulation)Power consumption majorly dominated by FIFO data registersThe DSPIN clock-gating reduced the power consumption by 67%DSPIN clock-tree consumes as much power as the router itself

- Needs to improve DSPIN clock-gating- GALS clock-tree consumes only 2.5% of total clock-tree power

Ivan MIRO PANADES – NOCS 200822

ConclusionPhysical implementation of the DSPIN Network-on-Chip on FAUST platform

- Comparison between ANOC and DSPIN at architecture level - Adaptation of DSPIN architecture to manage stream-oriented

communications- Implementation up to layout of DSPIN network on FAUST platform=> DSPIN mesochronous links fully implemented with standard tools

Comparison between DSPIN and ANOC NoCs (STMicroelectronics 130nm)

- Area of DSPIN is 33% smaller than ANOC one- Maximum sustained throughput of DSPIN is 31% higher than ANOC- ANOC has lower packet latency - DSPIN power consumption 1.5 to 3 times higher than ANOC

ANOC is a good candidate for low latency and low power applications, while DSPIN is more suited to low area and high performance applications

Ivan MIRO PANADES – NOCS 200823

See you at my PhD defense on May 20th

Ivan MIRO PANADES – NOCS 200824

top related