1 a modular synchronizing fifo for nocs vainbaum yuri

Post on 19-Dec-2015

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

A Modular Synchronizing FIFO for NoCs

Vainbaum Yuri

2

A Modular Synchronizing FIFO for NoCs•Paper presented in NOC-2009

Authors :•Tarik Ono -Sun Microsystems•Mark Greenstreet - University of British Columbia

3

Motivation & Purpose of Synchronizing FIFO

Timing Domain 1 Timing Domain 2 Timing Domain 3

Synchronizing FIFO

Synchronizing FIFO

Synchronizing FIFO

Network-on-Chip

•Multiple clock domains in NoC require many FIFOs

4

Synchronizing FIFO Targets

•Design Targets for FIFO: FIFO can be built using standard cells

Easy integration into CAD flow

Modular FIFO design with choice of clockless or clocked interfaces

Modular, simple architecture reduces NoC design time

5

Talk Outline• FIFO Overview

• FIFO BlocksClockless Put and Get InterfaceClocked Put and Get InterfaceFull-Empty Control and Data Store

• FIFO Latency and Throughput

• Implementation Results

6

FIFO Overview: Operation

stage 1 stage 2 stage 3

Put Interface

Get Interface

Sender

Receiver

Timing Domain A

•FIFO consists of number of stages

•Sender communicates with Put Interface, Receiver with Get Interface

•Tokens determine FIFO stage for next put and get operationTiming

Domain B

7

FIFO Overview: Structure

stage 1 stage 2 stage 3

Put

Interface

Cell

Sender

Receiver

Timing Domain A

•Each FIFO stage has aPut Interface CellGet Interface CellFull-Empty

•ControlData Store

Timing Domain B

Put

Interface

Cell

Put

Interface

Cell

Get

Interface

Cell

Get

Interface

Cell

Get

Interface

Cell

Full-EmptyControl

Full-EmptyControl

DataStore

Full-EmptyControl

DataStore

DataStore

8

FIFO Overview: Modular Design

stage 2 stage 3

Put

Interface

Cell

Sender

Receiver

Clocked

Domain A

Clockless Noc

Put

Interface

Cell

Get

Interface

Cell

Get

Interface

Cell

Full-EmptyControl

DataStore

Full-EmptyControl

DataStore

DataStore

Get

Interface

Cell

Put

Interface

CellFull-Empty

Control

stage 1

CLOCKED PUT INTERFACE

CLOCKLESS GET INTERFACE

• Mix-and-Match Interfaces

9

FIFO Overview: Modular Design

stage 2 stage 3

Sender

Receiver

Fast Clocked

Domain A

Slow Clocked Domain B

Full-EmptyControl

DataStore

Full-EmptyControl

DataStore

DataStore

Full-EmptyControl

stage 1

CLOCKED PUT INTERFACE

CLOCKED GET INTERFACE

•Mix-and-Match Interfaces

•Can use different synchronization time lengths, depending on clock frequency

•Changing FIFO size doesn't affect individual FIFO stage

1 flop

synchronizer

3 flop

synchronizer

10

Full Empty Control and Data store

•Data Store consists of latchesenabled when write is high

•Same blocks for clocked or clockless interfaces

•Full-Empty Control consists of a SR-latch on write, set output (full signal) highon read, set output low

11

asP* FIFO

• asP*- Asynchronous Symmetric Persistent Pulse Protocol Standard cells Good performance Doesn’t require C-elements

•asP* handshaking protocol is chosen as baseline for FIFO design

12

asP* FIFO -simulation

0

X

1 1 10 0 0 0

•Initial stateSR latches keeps track of empty/full statusAND gates coordinate data transfer between stages

13

asP* FIFO -simulation

1

D

1 1 1

•Data arrives , req risesSR latch EFi is set to indicate Li latch holds valid data

0 0 0 0

14

asP* FIFO -simulation

1

D

1

1

1 1

•Data arrives , req risesSR latch EFi is set to indicate Li latch holds valid data

0 0 0 0

15

asP* FIFO -simulation

1

D

1

1

D

1 1

•Data propagates through L1

0 0 0 0

16

asP* FIFO -simulation

1

D

0

1

D

1 1 1

•SR latch EF1 is set

0 0 0

17

asP* FIFO -simulation

0

X

0

0

D

1 1 1

1

•Enabling L2 latchWhen stage i-1 is full and i is empty AND gate goes high loading data to Li

0 0 0

18

asP* FIFO -simulation

0

X

1

0

D

0 0 1

1

D

1

•Clearing EF1 latchWhen stage i-1 is full and i is empty AND gate goes high loading data to LiClearing SR EFi-1 latch to indicate that latch Li is now empty

0 0

19

asP* FIFO -simulation

0

X

1

0

D

0 0 1

0

D

1

1

0 0

20

asP* FIFO -simulation

0

X

1

0

D

0 1 0

0

D

0

1

D

1 0

•Data available at output data_RReq_R goes high as data arrives to last stage

21

asP* FIFO -simulation

0

X

1

0

D

0 1 0

0

D

0

0

D

1 0

22

asP* FIFO -simulation

1

D1

1

1

D

0 1 0

0

D

0

0

D

1

•Next data enters FIFOActually it can enter just after ack_L falls indicating first data is written

0

23

asP* FIFO -simulation

0

D1

0

1

D1

1 1 0

0

D

0

0

D

1 0

24

asP* FIFO -simulation

0

X

0

0

D1

1 1 0

1

D

0

0

D

1 0

25

asP* FIFO -simulation

0

X

1

0

D1

0 0 0

1

D1

1

0

D

1 0

26

asP* FIFO -simulation

1

D2

1

0

D1

0 0 0

0

D1

1

0

D

1

•Next data enters FIFO

0

27

asP* FIFO -simulation

1

D2

1

1

D2

0 0 0

0

D1

1

0

D

1 0

28

asP* FIFO -simulation

1

D2

0

1

D2

1 0 0

0

D1

1

0

D

1 0

29

asP* FIFO -simulation

0

X

0

0

D2

1 0 0

0

D1

1

0

D

1 0

30

asP* FIFO -simulation

1

D3

0

0

D2

1 0 0

0

D1

1

0

D

1

•FIFO FULL!•No Acknowledge until next read out

0

31

asP* FIFO -simulation

1

D3

0

0

D2

1 0 0

0

D1

1

0

D

1 1

1

D

•Ack_R rises , data read out

32

asP* FIFO -simulation

1

D3

0

0

D2

1 0 1

0

D1

1

1

D

0 1

1

33

asP* FIFO -simulation

1

D3

0

0

D2

1 0 1

0

D1

1

1

D1

0 0

0

•Data propagates to empty space

34

asP* FIFO -simulation

1

D3

0

0

D2

1 1 0

0

D1

0

1

D1

1 0

0

35

asP* Put Interface Cell

1

D3

0

0

D2

1 1 0

1

D2

0

0

D1

1 0

0

•Data propagates to empty space

36

asP* FIFO -simulation

1

D3

1

0

D2

0 0 0

1

D2

1

0

D1

1 0

0

37

asP* FIFO -simulation

1

D3

1

0

D2

0 0 0

0

D2

1

0

D1

1 0

0

38

asP* FIFO -simulation

1

D3

1

1

D3

0 0 0

0

D2

1

0

D1

1 0

0

•Now D3 can enter FIFO

39

asP* FIFO -simulation

1

D3

0

0

D3

1 0 0

0

D2

1

0

D1

1 0

0

40

asP* FIFO -simulation

0

X

0

0

D3

1 0 0

0

D2

1

0

D1

1 0

0

•Sender lowers Req_L

41

asP* FIFO - Timing Issue

1

1

10

D

1

0

T[En->Q]<T[S->Q]+TAND

42

asP* FIFO - Timing Issue

1

1 10

0

0

MinResetPulseWidth[R->Q]<T[S->Q]+TAND

43

3-stage clockless FIFOWrite Port

Read Port

Write request Rises if write succeeded

Rises if data available at output

Receiver acknowledge receipt of data

44

Stage of clockless FIFO

Latches to load dataWritten when cell is empty

Tri-state buffer

Transfers tokens

45

asP* Put Interface Cell

Signal from Sender (fanout to all stages)

46

asP* Put Interface Cell

Signal to Sender (fanin from all stages)

47

asP* Put Interface Cell

Signal to Data Store and Full-Empty Control

48

asP* Put Interface Cell

Signal fromFull-Empty Control

49

asP* Put Interface Cell

Signal from previous stage

Signal to next stage

50

asP* Put Interface Cell

Sets in all but one cell to low

51

asP* Put Interface Cell

52

asP* Put Interface Cell

53

asP* Put Interface Cell

54

asP* Put Interface Cell

55

asP* Put Interface Cell

56

asP* Put Interface Cell

57

asP* Put Interface Cell

58

asP* Get Interface CellSignal fromReceiver

59

asP* Get Interface CellSignal toReceiver

60

asP* Get Interface Cell

Signal to Data Store and Full-Empty Control

61

asP* Get Interface Cell

Signal from Full-Empty Control

62

asP* Get Interface CellSignal to all stages

63

asP* Get Interface Cell -simulation

1

0

0 01

1

01

10

0

0

64

Full –empty cell

•Keeps track of whether cell is empty or full •Set by write operation from put interface •Reset by read operation from get interface•AND gate ensures MUTEX on Set and Reset

Avoid races Simplifies timing

65

Timing requirements for FIFO

The minimum low time for req_put must be at least as  large as the minimum clock pulse width for the FFs in the put interfaces.

The minimum high time for req_put must be at least as large as the minimum pulse width for the set signal of the SR latch in the empty/full controller.

The minimum high time for got_data must be at least as large as the minimum pulse width for the set signal of the SR latch.

66

Protocol converters

•asP* simple and efficient

•But: timing constraints make it unsuitable for long interconnect

•LEDR is delay insensitive and better suited for long interconnect

•Other converters possible

67

LEDR protocol –brief overview •Dual-rail encoding: two wires per bit – delay-insensitive•“Level-encoding”:

•Data rail: holds actual data value•Parity rail: holds parity value

•Alternating-phase protocol: •Encoding parity alternates between odd and even

00 11

EvenEven 0 0 00 1 1 11

OddOdd 0 0 11 1 1 00

data raildata rail

parity railparity rail

Bit valueBit value

LEDR EncodingLEDR Encoding

PhasePhase

68

LEDR signaling

datadata

parityparity

eveneven oddodd eveneven evenevenoddodd evenevenoddodd

Data rail: carries bit Data rail: carries bit value in both phasesvalue in both phases

Parity rail: phase alternates with Parity rail: phase alternates with each data itemeach data item

00 11 00 00 11 11 11

•Exactly one wire transition for each new data item

69

LEDR - completion detector

LEDR data<0>

LEDR parity<0>

comp

1-bit LEDR completion 1-bit LEDR completion detectordetector

C

C

C

LEDR data<0>

LEDR parity<0>

LEDR parity<1>

LEDR data<1>

LEDR parity<N-1>

LEDR data<N-1>

phase

N-bit LEDR completion N-bit LEDR completion detectordetector

70

LEDR-to-asP* converterCompletion detector per bit

Even parity detector

Odd parity detector

Store data when all data [1:n] bits have changed

LEDR to asP* converter

71

LEDR-to-asP* converterIn this Example :Assume Even parity phase

1

X

P

1

0

0

D

10

0

D

1

0

1

D

72

LEDR-to-asP* converter

1

D

P

1

0

0

X

00

1

D1

0

1

1

00

1

1

01

1

0

X

X

73

asP*-to-LEDR converter

0

0

0

0

00

0

74

asP*-to-LEDR converter

0

0

0

0

00

0

1

1D

D1

1

DP

1

0

1

0

0

1

1

75

Clocked FIFOs

• Design goal is to provide all flavors of synchronization converters Synchronous-to-Asynchronous Asynchronous-to-Synchronous Synchronous-to-Synchronous

• Asyn-to-Sync and Sycn-to-Async is obtained by combining async put interface with sync get interace and vice versa

• Synchronous-to-Synchronous will be detailed in next slides

76

3-Stage Clocked FIFO Indicates that Data can be put into FIFO

Ensures fully sync behavior

77

FIFO stage with clocked RX and TX

78

Clocked Put Interface CellSignal to sender

Signals from sender

Synchronizer●State (full or empty) of FIFO stage is synchronized

●One 1-bit synchronizer per FIFO stage interface

●Asymmetric delay

79

Clocked Put Interface Cell

!

80

Clocked Put Interface Cell

!

81

Clocked Put Interface Cell

!

82

Clocked Put Interface Cell

!

83

Clocked Put Interface Cell

!

84

Clocked Put Interface Cell

!

85

Clocked Put Interface Cell

!

86

Clocked Put Interface Cell

!

87

Clocked Put Interface Cell

!

88

Clocked Put Interface Cell

Clocked get interface cell is analogous

89

Example of 1.5 cycle Synchronizer

IN

OUT

Async_

OUT

90

Synchronizer

•MTBF for different synchronizers and clock speeds 90nm technology

τ- metastability resolving constant

91

FIFO latency and throughput

• Latencyminimum time data spends in FIFOindependent of FIFO length

• Throughput maximum number of data transfers per time depends on FIFO length

92

FIFO throughput

• Throughput is limited by slower of put and get interfaces

• Put interface delay: minimum time between two successive FIFO writes

• Get interface delay: minimum time between two successive FIFO reads

93

Clocked FIFO throughput simulation

•Simulation scenario2-cycle synchronizerSame put and get frequency with zero phase shift

•Throughput resultsDoesn’t allow to write every clock cycle Need to increase FIFO to 6 stagesFIFO with equal put and get frequencies and n-cycle synchronizer needs 2*(n+1) stages to support max throughput

94

asP* FIFO latencyWrite latency Read latency Receiver latency

Full-EmptyControl

95

asP* FIFO latency –clockless

•Latency measured from rising req_put to data_valid rising (220ps) + got_data rising to empty cell status (140ps) equals at total to 360ps

•Throughput limited by slower get and put interface, evaluated max 1.95Ghz

•Power 5.27mW at 1.95Ghz

5.27mW

96

asP* FIFO latency –clocked

•Latency measured from rising clk_put to rising clk_get with valid data (doesn’t depends on FIFO length) + tsync(173ps)

•Throughput gain when using 6 stage FIFO is 2 times

•6 stages FIFO running at 1.28Ghz consumes 4.91mW

5.27mW

97

Clocked FIFO latency • Measured from clk_put edge that latches data in FIFO until clk_get edge that notifies receiver of available data

98

Clocked FIFO throughput • Throughput determined by slower of put and get interfaces

• There is a minimum required FIFO length to support maximum throughput

• Minimum FIFO length depends on synchronization latenciesratio of put and get clock speedsphase relationship of put and get clock

99

Conclusions • Presented a synchronizing FIFO that

can be built using standard cells

has modular design

following properties can be chosen independently:type of put and get interfacesynchronization time lengthFIFO size

has simple interfaces

100

References •T.Ono, M.Greenstreet. A modular synchronizing FIFO for NoCsProceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip

•M. E. Dean, T. E. Williams, and D. L. Dill. Efficient selftiming with level-encoded 2-phase dual-rail (LEDR). 1991. MIT Press.

•C. E. Molnar, I. W. Jones, W. S. Coates, and J. K. Lexau. A FIFO ring performance experiment. In Advanced Research in Asynchronous Circuits and Systems, 1997. Proceedings of the Third International Symposium on, pages 279–289,Eindhoven, Apr. 1997.

•I. E. Sutherland. Micropipelines. Commun. ACM,32(6):720–738, June 1989. Turing Award lecture.

•Mark Dean, Ted Williams and David Dill, “Efficient Self-Timing with Level-Encoded 2-Phase Dual Rail(LEDR)”, ARVLSI, 1991, pp. 55-70.

top related