1 a modular synchronizing fifo for nocs vainbaum yuri
Post on 19-Dec-2015
213 views
TRANSCRIPT
1
A Modular Synchronizing FIFO for NoCs
Vainbaum Yuri
2
A Modular Synchronizing FIFO for NoCs•Paper presented in NOC-2009
Authors :•Tarik Ono -Sun Microsystems•Mark Greenstreet - University of British Columbia
3
Motivation & Purpose of Synchronizing FIFO
Timing Domain 1 Timing Domain 2 Timing Domain 3
Synchronizing FIFO
Synchronizing FIFO
Synchronizing FIFO
Network-on-Chip
•Multiple clock domains in NoC require many FIFOs
4
Synchronizing FIFO Targets
•Design Targets for FIFO: FIFO can be built using standard cells
Easy integration into CAD flow
Modular FIFO design with choice of clockless or clocked interfaces
Modular, simple architecture reduces NoC design time
5
Talk Outline• FIFO Overview
• FIFO BlocksClockless Put and Get InterfaceClocked Put and Get InterfaceFull-Empty Control and Data Store
• FIFO Latency and Throughput
• Implementation Results
6
FIFO Overview: Operation
stage 1 stage 2 stage 3
Put Interface
Get Interface
Sender
Receiver
Timing Domain A
•FIFO consists of number of stages
•Sender communicates with Put Interface, Receiver with Get Interface
•Tokens determine FIFO stage for next put and get operationTiming
Domain B
7
FIFO Overview: Structure
stage 1 stage 2 stage 3
Put
Interface
Cell
Sender
Receiver
Timing Domain A
•Each FIFO stage has aPut Interface CellGet Interface CellFull-Empty
•ControlData Store
Timing Domain B
Put
Interface
Cell
Put
Interface
Cell
Get
Interface
Cell
Get
Interface
Cell
Get
Interface
Cell
Full-EmptyControl
Full-EmptyControl
DataStore
Full-EmptyControl
DataStore
DataStore
8
FIFO Overview: Modular Design
stage 2 stage 3
Put
Interface
Cell
Sender
Receiver
Clocked
Domain A
Clockless Noc
Put
Interface
Cell
Get
Interface
Cell
Get
Interface
Cell
Full-EmptyControl
DataStore
Full-EmptyControl
DataStore
DataStore
Get
Interface
Cell
Put
Interface
CellFull-Empty
Control
stage 1
CLOCKED PUT INTERFACE
CLOCKLESS GET INTERFACE
• Mix-and-Match Interfaces
9
FIFO Overview: Modular Design
stage 2 stage 3
Sender
Receiver
Fast Clocked
Domain A
Slow Clocked Domain B
Full-EmptyControl
DataStore
Full-EmptyControl
DataStore
DataStore
Full-EmptyControl
stage 1
CLOCKED PUT INTERFACE
CLOCKED GET INTERFACE
•Mix-and-Match Interfaces
•Can use different synchronization time lengths, depending on clock frequency
•Changing FIFO size doesn't affect individual FIFO stage
1 flop
synchronizer
3 flop
synchronizer
10
Full Empty Control and Data store
•Data Store consists of latchesenabled when write is high
•Same blocks for clocked or clockless interfaces
•Full-Empty Control consists of a SR-latch on write, set output (full signal) highon read, set output low
11
asP* FIFO
• asP*- Asynchronous Symmetric Persistent Pulse Protocol Standard cells Good performance Doesn’t require C-elements
•asP* handshaking protocol is chosen as baseline for FIFO design
12
asP* FIFO -simulation
0
X
1 1 10 0 0 0
•Initial stateSR latches keeps track of empty/full statusAND gates coordinate data transfer between stages
13
asP* FIFO -simulation
1
D
1 1 1
•Data arrives , req risesSR latch EFi is set to indicate Li latch holds valid data
0 0 0 0
14
asP* FIFO -simulation
1
D
1
1
1 1
•Data arrives , req risesSR latch EFi is set to indicate Li latch holds valid data
0 0 0 0
15
asP* FIFO -simulation
1
D
1
1
D
1 1
•Data propagates through L1
0 0 0 0
16
asP* FIFO -simulation
1
D
0
1
D
1 1 1
•SR latch EF1 is set
0 0 0
17
asP* FIFO -simulation
0
X
0
0
D
1 1 1
1
•Enabling L2 latchWhen stage i-1 is full and i is empty AND gate goes high loading data to Li
0 0 0
18
asP* FIFO -simulation
0
X
1
0
D
0 0 1
1
D
1
•Clearing EF1 latchWhen stage i-1 is full and i is empty AND gate goes high loading data to LiClearing SR EFi-1 latch to indicate that latch Li is now empty
0 0
19
asP* FIFO -simulation
0
X
1
0
D
0 0 1
0
D
1
1
0 0
20
asP* FIFO -simulation
0
X
1
0
D
0 1 0
0
D
0
1
D
1 0
•Data available at output data_RReq_R goes high as data arrives to last stage
21
asP* FIFO -simulation
0
X
1
0
D
0 1 0
0
D
0
0
D
1 0
22
asP* FIFO -simulation
1
D1
1
1
D
0 1 0
0
D
0
0
D
1
•Next data enters FIFOActually it can enter just after ack_L falls indicating first data is written
0
23
asP* FIFO -simulation
0
D1
0
1
D1
1 1 0
0
D
0
0
D
1 0
24
asP* FIFO -simulation
0
X
0
0
D1
1 1 0
1
D
0
0
D
1 0
25
asP* FIFO -simulation
0
X
1
0
D1
0 0 0
1
D1
1
0
D
1 0
26
asP* FIFO -simulation
1
D2
1
0
D1
0 0 0
0
D1
1
0
D
1
•Next data enters FIFO
0
27
asP* FIFO -simulation
1
D2
1
1
D2
0 0 0
0
D1
1
0
D
1 0
28
asP* FIFO -simulation
1
D2
0
1
D2
1 0 0
0
D1
1
0
D
1 0
29
asP* FIFO -simulation
0
X
0
0
D2
1 0 0
0
D1
1
0
D
1 0
30
asP* FIFO -simulation
1
D3
0
0
D2
1 0 0
0
D1
1
0
D
1
•FIFO FULL!•No Acknowledge until next read out
0
31
asP* FIFO -simulation
1
D3
0
0
D2
1 0 0
0
D1
1
0
D
1 1
1
D
•Ack_R rises , data read out
32
asP* FIFO -simulation
1
D3
0
0
D2
1 0 1
0
D1
1
1
D
0 1
1
33
asP* FIFO -simulation
1
D3
0
0
D2
1 0 1
0
D1
1
1
D1
0 0
0
•Data propagates to empty space
34
asP* FIFO -simulation
1
D3
0
0
D2
1 1 0
0
D1
0
1
D1
1 0
0
35
asP* Put Interface Cell
1
D3
0
0
D2
1 1 0
1
D2
0
0
D1
1 0
0
•Data propagates to empty space
36
asP* FIFO -simulation
1
D3
1
0
D2
0 0 0
1
D2
1
0
D1
1 0
0
37
asP* FIFO -simulation
1
D3
1
0
D2
0 0 0
0
D2
1
0
D1
1 0
0
38
asP* FIFO -simulation
1
D3
1
1
D3
0 0 0
0
D2
1
0
D1
1 0
0
•Now D3 can enter FIFO
39
asP* FIFO -simulation
1
D3
0
0
D3
1 0 0
0
D2
1
0
D1
1 0
0
40
asP* FIFO -simulation
0
X
0
0
D3
1 0 0
0
D2
1
0
D1
1 0
0
•Sender lowers Req_L
41
asP* FIFO - Timing Issue
1
1
10
D
1
0
T[En->Q]<T[S->Q]+TAND
42
asP* FIFO - Timing Issue
1
1 10
0
0
MinResetPulseWidth[R->Q]<T[S->Q]+TAND
43
3-stage clockless FIFOWrite Port
Read Port
Write request Rises if write succeeded
Rises if data available at output
Receiver acknowledge receipt of data
44
Stage of clockless FIFO
Latches to load dataWritten when cell is empty
Tri-state buffer
Transfers tokens
45
asP* Put Interface Cell
Signal from Sender (fanout to all stages)
46
asP* Put Interface Cell
Signal to Sender (fanin from all stages)
47
asP* Put Interface Cell
Signal to Data Store and Full-Empty Control
48
asP* Put Interface Cell
Signal fromFull-Empty Control
49
asP* Put Interface Cell
Signal from previous stage
Signal to next stage
50
asP* Put Interface Cell
Sets in all but one cell to low
51
asP* Put Interface Cell
52
asP* Put Interface Cell
53
asP* Put Interface Cell
54
asP* Put Interface Cell
55
asP* Put Interface Cell
56
asP* Put Interface Cell
57
asP* Put Interface Cell
58
asP* Get Interface CellSignal fromReceiver
59
asP* Get Interface CellSignal toReceiver
60
asP* Get Interface Cell
Signal to Data Store and Full-Empty Control
61
asP* Get Interface Cell
Signal from Full-Empty Control
62
asP* Get Interface CellSignal to all stages
63
asP* Get Interface Cell -simulation
1
0
0 01
1
01
10
0
0
64
Full –empty cell
•Keeps track of whether cell is empty or full •Set by write operation from put interface •Reset by read operation from get interface•AND gate ensures MUTEX on Set and Reset
Avoid races Simplifies timing
65
Timing requirements for FIFO
The minimum low time for req_put must be at least as large as the minimum clock pulse width for the FFs in the put interfaces.
The minimum high time for req_put must be at least as large as the minimum pulse width for the set signal of the SR latch in the empty/full controller.
The minimum high time for got_data must be at least as large as the minimum pulse width for the set signal of the SR latch.
66
Protocol converters
•asP* simple and efficient
•But: timing constraints make it unsuitable for long interconnect
•LEDR is delay insensitive and better suited for long interconnect
•Other converters possible
67
LEDR protocol –brief overview •Dual-rail encoding: two wires per bit – delay-insensitive•“Level-encoding”:
•Data rail: holds actual data value•Parity rail: holds parity value
•Alternating-phase protocol: •Encoding parity alternates between odd and even
00 11
EvenEven 0 0 00 1 1 11
OddOdd 0 0 11 1 1 00
data raildata rail
parity railparity rail
Bit valueBit value
LEDR EncodingLEDR Encoding
PhasePhase
68
LEDR signaling
datadata
parityparity
eveneven oddodd eveneven evenevenoddodd evenevenoddodd
Data rail: carries bit Data rail: carries bit value in both phasesvalue in both phases
Parity rail: phase alternates with Parity rail: phase alternates with each data itemeach data item
00 11 00 00 11 11 11
•Exactly one wire transition for each new data item
69
LEDR - completion detector
LEDR data<0>
LEDR parity<0>
comp
1-bit LEDR completion 1-bit LEDR completion detectordetector
C
C
C
LEDR data<0>
LEDR parity<0>
LEDR parity<1>
LEDR data<1>
LEDR parity<N-1>
LEDR data<N-1>
phase
N-bit LEDR completion N-bit LEDR completion detectordetector
70
LEDR-to-asP* converterCompletion detector per bit
Even parity detector
Odd parity detector
Store data when all data [1:n] bits have changed
LEDR to asP* converter
71
LEDR-to-asP* converterIn this Example :Assume Even parity phase
1
X
P
1
0
0
D
10
0
D
1
0
1
D
72
LEDR-to-asP* converter
1
D
P
1
0
0
X
00
1
D1
0
1
1
00
1
1
01
1
0
X
X
73
asP*-to-LEDR converter
0
0
0
0
00
0
74
asP*-to-LEDR converter
0
0
0
0
00
0
1
1D
D1
1
DP
1
0
1
0
0
1
1
75
Clocked FIFOs
• Design goal is to provide all flavors of synchronization converters Synchronous-to-Asynchronous Asynchronous-to-Synchronous Synchronous-to-Synchronous
• Asyn-to-Sync and Sycn-to-Async is obtained by combining async put interface with sync get interace and vice versa
• Synchronous-to-Synchronous will be detailed in next slides
76
3-Stage Clocked FIFO Indicates that Data can be put into FIFO
Ensures fully sync behavior
77
FIFO stage with clocked RX and TX
78
Clocked Put Interface CellSignal to sender
Signals from sender
Synchronizer●State (full or empty) of FIFO stage is synchronized
●One 1-bit synchronizer per FIFO stage interface
●Asymmetric delay
79
Clocked Put Interface Cell
!
80
Clocked Put Interface Cell
!
81
Clocked Put Interface Cell
!
82
Clocked Put Interface Cell
!
83
Clocked Put Interface Cell
!
84
Clocked Put Interface Cell
!
85
Clocked Put Interface Cell
!
86
Clocked Put Interface Cell
!
87
Clocked Put Interface Cell
!
88
Clocked Put Interface Cell
Clocked get interface cell is analogous
89
Example of 1.5 cycle Synchronizer
IN
OUT
Async_
OUT
90
Synchronizer
•MTBF for different synchronizers and clock speeds 90nm technology
τ- metastability resolving constant
91
FIFO latency and throughput
• Latencyminimum time data spends in FIFOindependent of FIFO length
• Throughput maximum number of data transfers per time depends on FIFO length
92
FIFO throughput
• Throughput is limited by slower of put and get interfaces
• Put interface delay: minimum time between two successive FIFO writes
• Get interface delay: minimum time between two successive FIFO reads
93
Clocked FIFO throughput simulation
•Simulation scenario2-cycle synchronizerSame put and get frequency with zero phase shift
•Throughput resultsDoesn’t allow to write every clock cycle Need to increase FIFO to 6 stagesFIFO with equal put and get frequencies and n-cycle synchronizer needs 2*(n+1) stages to support max throughput
94
asP* FIFO latencyWrite latency Read latency Receiver latency
Full-EmptyControl
95
asP* FIFO latency –clockless
•Latency measured from rising req_put to data_valid rising (220ps) + got_data rising to empty cell status (140ps) equals at total to 360ps
•Throughput limited by slower get and put interface, evaluated max 1.95Ghz
•Power 5.27mW at 1.95Ghz
5.27mW
96
asP* FIFO latency –clocked
•Latency measured from rising clk_put to rising clk_get with valid data (doesn’t depends on FIFO length) + tsync(173ps)
•Throughput gain when using 6 stage FIFO is 2 times
•6 stages FIFO running at 1.28Ghz consumes 4.91mW
5.27mW
97
Clocked FIFO latency • Measured from clk_put edge that latches data in FIFO until clk_get edge that notifies receiver of available data
98
Clocked FIFO throughput • Throughput determined by slower of put and get interfaces
• There is a minimum required FIFO length to support maximum throughput
• Minimum FIFO length depends on synchronization latenciesratio of put and get clock speedsphase relationship of put and get clock
99
Conclusions • Presented a synchronizing FIFO that
can be built using standard cells
has modular design
following properties can be chosen independently:type of put and get interfacesynchronization time lengthFIFO size
has simple interfaces
100
References •T.Ono, M.Greenstreet. A modular synchronizing FIFO for NoCsProceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
•M. E. Dean, T. E. Williams, and D. L. Dill. Efficient selftiming with level-encoded 2-phase dual-rail (LEDR). 1991. MIT Press.
•C. E. Molnar, I. W. Jones, W. S. Coates, and J. K. Lexau. A FIFO ring performance experiment. In Advanced Research in Asynchronous Circuits and Systems, 1997. Proceedings of the Third International Symposium on, pages 279–289,Eindhoven, Apr. 1997.
•I. E. Sutherland. Micropipelines. Commun. ACM,32(6):720–738, June 1989. Turing Award lecture.
•Mark Dean, Ted Williams and David Dill, “Efficient Self-Timing with Level-Encoded 2-Phase Dual Rail(LEDR)”, ARVLSI, 1991, pp. 55-70.