async. seminar fox fast on-chip interconnect reuven dobkin technion – israel institute of...
Post on 19-Dec-2015
213 views
TRANSCRIPT
Async. SeminarAsync. Seminar
FOXFOXFast On-Chip InterconnectFast On-Chip Interconnect
Reuven DobkinReuven Dobkin
Technion – Israel Institute of TechnologyTechnion – Israel Institute of Technology
Electrical Engineering Department – VLSI LabElectrical Engineering Department – VLSI Lab
April 18, 2010April 18, 2010
2 Async. Seminar, Technion, April 18, 2010
Presentation Outline
• SoC Communication Challenges• Serial Link Benefits
• Parallel vs. Serial Link Analysis• Fast Asynchronous Serial Link
• Transmitter, Fast LEDR Encoder• Receiver, Fast Toggle Circuit• Channel
• Voltage Mode Async Signaling• Current Mode Async Signaling
• FOX Test Chip• Future Research Topics
3 Async. Seminar, Technion, April 18, 2010
System-on-Chip (SoC) Challenges
• Non-scalable global interconnect• High power and latency
• Multiple number of modules• Congested inter-modular communication
• Multiple clock and voltage domains
• Power-reduction dynamic techniques• Complex inter-module timing relations
• Growing bandwidth requirements• Unsupported by conventional buses
4 Async. Seminar, Technion, April 18, 2010
• Constructed of multiple wires and repeaters• Incur high leakage power (line drivers and repeaters)• Often have low utilization (leakage again)• Occupy large chip area (routing difficulty)• Incur cross-coupling (delay matching & skew problems)• Present a significant capacitive load
N
Parallel NoC Links and SoC Buses
5 Async. Seminar, Technion, April 18, 2010
Bit-Serial Interconnect
• Fewer lines, fewer line drivers and fewer repeaters
• Reduced leakage power
• Reduced chip area
• Better routability
• Reduced coupling
• Should work N times faster!
FSER=NFPAR
6 Async. Seminar, Technion, April 18, 2010
Fast Serial Interconnect
• Synchronous Approach:• Source-synchronous, with Clock Data Recovery (CDR)
• Problem: CDR power and area
• Asynchronous Approach: • Normal methods are too slow
• We consider a faster one
+-
+-
+-
+-
DLL- or PLL-Based Clock
Recovery
7 Async. Seminar, Technion, April 18, 2010
Seeking Highest Speed
• No “spacers” (NRZ)
• One transition per bit
• Fast signaling • Targeted Speed: One gate delay between bits
• For 65nm technology this results in 67 Gbps
One Gate Delay =
FO4 INV Delay
8 Async. Seminar, Technion, April 18, 2010
Serial Link – Top Structure
• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)
• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers
Sender Receiver
Word Ack
Bit-Serial ChannelSynchr. Synchr.Serializer
& LEDREncoder
DeSerializer& LEDRDecoder
P
S
9 Async. Seminar, Technion, April 18, 2010
Encoding –Two Phase NRZ LEDR
• Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)
Uncoded (B)
State bit (S)
Phase bit (P)
0 0 1 1 0 0 0 0 1 0
( ),( )
( ),
B i i oddP i
B i i even
( ) ( )S i B i i
10 Async. Seminar, Technion, April 18, 2010
Serial Link Applications
• P2P long-range global interconnect
• Long range NoC links
• Pin-limited on-chip module interfaces• Presently chips are pin-limited, and that will migrate
inside
• Cross-bar • Simpler routing and congestion
• Communications inside many-core CMPs
11 Async. Seminar, Technion, April 18, 2010
Name Data Cycle(# of FO4 Inv. Delays)
WAFT 7* (3 Gbps at 180nm)
IIP 3* (5 Gbps at 250nm)
3-Levels 3* (1 Gbps at 1200nm)
Pulsed CM 2 (8 Gbps at 180nm)
This Work 1 (67 Gbps at 65nm-HP)
Related Work
Async. SeminarAsync. Seminar
Parallel vs. Serial Link Analysis
13 Async. Seminar, Technion, April 18, 2010
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
180 130 90 65 30 15
Parallel Link dissipates less power
Serial Link dissipates less power
Lin
k L
engt
h [m
m]
Serial Link BenefitsAnalysis goal
• To show that novel serial link outperforms the parallel one for:• Long ranges
• Advanced technology nodes
Technology Node [nm]
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
180 130 90 65 30 15
Parallel Link requires less area
Serial Link requires less area
Technology Node [nm]
14 Async. Seminar, Technion, April 18, 2010
Method
• Choose • Parallel link implementation representatives
• Serial link implementation representatives
• Compare the parallel and serial link approaches in terms of:• Area
• Power
• Latency
• Technology scaling
15 Async. Seminar, Technion, April 18, 2010
"Register-Pipelined" Parallel Link
1
2
N
CLKT
1
2
N
CLKR
1
2
N
• Fully synchronous• Interconnect as combinational logic between registers• Source synchronous or global clock
High cost for high bit rates!
16 Async. Seminar, Technion, April 18, 2010
"Wave-Pipelined" Parallel Link
1
2
N
CLKRRepeater Stage
1
2
N
CLKT
WORDN(i+1) WORDN(i)
Bit rate is limited by relative skew of the link wires
17 Async. Seminar, Technion, April 18, 2010
Crosstalk Mitigation andPower Reduction
• Shielding / Spacing• Staggered repeaters• Interleaved bi-directional lines• Asynchronous signaling• Data encoding • Data pattern recognition with special worst-case
handling
• This work analyzes the two extremes of shielding:• Unshielded wires (a)• Fully-shielded wires (b) (a) (b)
18 Async. Seminar, Technion, April 18, 2010
Single Gate-Delay Serial Link (reminder)
• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)
• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers
Sender Receiver
Word Ack
Bit-Serial ChannelSynchr. Synchr.Serializer
& LEDREncoder
DeSerializer& LEDRDecoder
P
S
19 Async. Seminar, Technion, April 18, 2010
Analytical Models
Parallel and Serial Link Bit Rates
Please refer to the paper below for details on the exact analytical models employed in the work:
R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, "Parallel vs. Serial On-Chip Communication," SLIP, pp. 43-50, 2008
20 Async. Seminar, Technion, April 18, 2010
Parallel Link Bit Rate Limitations (1)
A. Fastest available clock• Ring oscillator limitation: 8d4
• Fast processors: 11d4 (e.g. CELL)• Standard SoC/ASIC: 100-400d4
B. Synchronization Latency• May take several clocks in case of asynchronous
clock relation
C. Clock uncertainty• Extended critical path
RESET
CLK
DOUT
SOURCE
LEAF
21 Async. Seminar, Technion, April 18, 2010
Parallel Link Bit Rate Limitations (2)
D. Delay Uncertainty• The skew and jitter of the clock• Repeater delay variations • Wire delay variations
mostly metal thickness variations
• Via variations• Cross-Coupling (Crosstalk)• Geometry
Outcome of routing congestion and multi-layer structure
N.S. Nagaraj DAC 2005 / L. Scheffer, SLIP 2006
22 Async. Seminar, Technion, April 18, 2010
Parallel Link Minimal Clock Cycle (1)
LinkLength
TSU+2DCLK TH+2DCLK
Maximal Data Delay, dMAX
Minimal Data Delay, dMIN
CLKT
DCLK
Notations from W.P. Burleson, et al., Wave-Pipelining: A Tutorial and Research Survey, TVLSI, 1998
2 ( ) 4CLK MAX MIN CLK SU HT T Td d D
Latest data clocking
Earliest data clocking
Clock Uncertainty
23 Async. Seminar, Technion, April 18, 2010
Impact of Process Variations in Repeaters on Multi-Wire Delay Uncertainty
• Variation types• Random variations
closely placed devices
• "Systematic" variations location on the die
• Relative skew (dMAX–dMIN)• Repeaters in the same stage are highly
correlated• Random variations are averaged out
thanks to large repeater sizing• Systematic inter-stage variations are
averaged out along the link
Repeater Stage
1
2
N
CLKT
Random variations inside Repeater Stage
are averaged out
Relative skew among the lines due to variations in repeaters is small Multi-wire delay uncertainty is dominated by Cross-Coupling
24 Async. Seminar, Technion, April 18, 2010
Parallel Link Minimal Clock Cycle (2)
• Minimal clock cycle:
• System clock limitation:
• Register-pipelined link:
• Distance between successive pipeline stages is affected by Delay Uncertainty
2 ( ) 4CLK CLK SU HT L T T D
65nm example
The rate is bounded by clock cycle rather than the
delay uncertainty
The rate is bounded by clock cycle rather than by
the delay uncertainty
max{2 ( ) 4
,
}
PARCLK CLK
SU H
SYSTEM CLOCK
T L
T T
T
D
PARCLK SYSTEM CLOCKT T
Worst case skew between two lines: Cross-coupling
and wire variations
25 Async. Seminar, Technion, April 18, 2010
Serial Link Bit Rate
• Skew due to transistor variations is neglected• much smaller than in parallel link
• Coupling factor is always known• LEDR encoding: there is only one transition
per each transmitted bit• The skew is not affected by cross-coupling
link delay is similar for all symbols
• Bit rate:
41 /SERB d
26 Async. Seminar, Technion, April 18, 2010
Scalability
Number of repeaters (per millimeter) grows for more advanced technology nodes
Active area and leakage: Minimal link length for serial link employment decreases with technology
Dynamic power: Minimal link length for serial link employment decreases with technology
Interconnect area: Serial link is always preferable
0
0.5
1
1.5
2
180 130 90 65 45 32
Technology Node [nm]
k=
K/m
m
0.0
2.0
4.0
6.0
8.0
10.0
Lm
in [
mm
]
Y.I. Ismail, et al., Repeater Insertion in RLC Lines for Minimum Propagation Delay, ISCAS99
Power
Repeaters
Number of repeaters grows with technology
node scaling
Area and Leakage
Equal throughput Parallel and Serial links
are assumed
27 Async. Seminar, Technion, April 18, 2010
65 nm Test Case Summary
Wave-Pipeline vs. Serial Register-pipelined vs. Serial
Shielding Fully Shielded Unshielded Fully Shielded Unshielded
Length of parallel link
unlimited up to 6mm unlimited unlimited
Clock cycle of parallel link
8d4 8d4
10d4
(fast)
130d4
(slow)
10d4
(fast)
130d4
(slow)
To minimize the following:
choose a serial link for links longer than:
Area Always Always Always Always
Power 2 mm 4mm 3mm 3mm 1mm 3mm
Latency 2 mm Never* 4mm 12mm 2mm 9mm
Minimal length above which the serial link is preferred
Async. SeminarAsync. Seminar
Single Gate Delay Serial Link
Architecture
29 Async. Seminar, Technion, April 18, 2010
Single Gate-Delay Serial Link
• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)
• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers
Sender Receiver
Word Ack
Bit-Serial ChannelSynchr. Synchr.Serializer
& LEDREncoder
DeSerializer& LEDRDecoder
P
S
30 Async. Seminar, Technion, April 18, 2010
Transmitter – Fast SR Approach
Transition
Generator
P
P
Parallel Load Interface
Load Enable
Uncoded Data
Shift-Register, SR(B) Beven
T0T0
Bodd
S
S
LEDREncoder
P
P
S
S
T90T90
Bodd
Beven
OT0
OT0
OT90
OT90
• Targeted Speed: One gate delay between bits
31 Async. Seminar, Technion, April 18, 2010
Multi-Phase Transition Generation
SA SA SA SA
clk ]3[ clk ]7[ clk ]2[ clk ]6[ clk ]1[ clk ]5[ clk ]0[ clk ]4[
A
B
RESET
RESET
C1
C2
C3
C4
C11=C1 xor C3
C22=C2 xor C4
T=C11 xor C22
C1
C2
C3
C4
C11
C22
T
(a) (b)
T0
T90
The circuit is based on multi-phase clock generator from M.J.E. Lee, "An Efficient I/O and Clock Recovery for TERABIT Integrated Circuits Design," PhD Thesis, Stanford Univ., 2001.
Async. Seminar, Technion, 18 April 201032
T
XL[1] XL[0]
Data-0Data10
XL[W-2]
Data(W-1)(W-2)
Parallel Load Interface
NotConnected
OUT
Fast Asynchronous Shift Register
33 Async. Seminar, Technion, April 18, 2010
-25
-20
-15
-10
-5
0
5
1.00E+09 1.00E+10 1.00E+11Frequency [Hz]
DVb=0DVb=0.05DVb=0.1DVb=0.15DVb=0.2DVb=0.25DVb=0.3
-8
-6
-4
-2
0
2
4
6
8
1.00E+09 1.00E+10 1.00E+11Frequency [Hz]
0.5 Full Swing0.6 Full Swing0.7 Full Swing0.8 Full Swing0.9 Full SwingFull Swing
BiasGain[dB]
Voltage SwingGain[dB]
Wave-pipelined Control Characteristics
• The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram
• This operating point results in signal degradation along the inverter chain
Single Gate Delay Rate
BIAS
SWING
25
30
35
40
45
50
55
60
65
0 5 10 15 20 25 30
Inverter Chain Length, N
Cut-off Frequency[GHz]
34 Async. Seminar, Technion, April 18, 2010
Splitter Architecture
• The shift-register is partitioned into M shift-registers M slower operation in each shift-register
• Signal is no longer degraded • Single gate-delay operation is localized to output (input) stage only
Shift-Register for Odd Bits
Shift-Register for Even BitsMerge
PARALLEL LOAD
Shift-Register for Odd Bits
Shift-Register for Even BitsSplit
PARALLEL LOAD PARALLEL READ
PARALLEL READ
Transmitter Receiver
35 Async. Seminar, Technion, April 18, 2010
Splitter Architecture – Performance
• Splitter architecture employment• Better resiliency to process variation thanks to
localized high-speed operation• Power Reduction:
2
2 1
NON SPLITWP CTRL BUF SER
SPLITTER NON SPLITSERWP CTRL BUF WP CTRL
P N C V F
FNP M C V P
M M M
36
Transmitter Splitter Architecture
XLEVEN[N/2]
C90
XLODD[N/2]
C
XOR
P
BODD
BEVEN
P
S
S
BODD
BEVEN
LEDR ENCODER
MergeEVEN
MergeODD
SREVEN
SRODD
37 Async. Seminar, Technion, April 18, 2010
Transmitter – SPICE Simulation (65nm node)
30 ps
C0
C90
BEVEN
BODD
C
P
S
60 ps
START BIT=1
1 0 0 0 1 0 1 0
0 0 1 0 1 0 1 Dummy
1 0 0 1 0 1 0 11 0 0 0 1 0 1 0
Simulations done at
Async. Seminar, Technion, 18 April 201038
Receiver
39
Receiver Splitter Architecture
XLEVEN[1]
XLODD[1]
SREVEN
SRODD
TOGGLEC
S
A
B
T
SEVEN
SODD
SPLIT
40 Async. Seminar, Technion, April 18, 2010
Toggle Circuit
• Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle)
• Novel toggle: • Single gate delay operation support• Internal and output latches
T
A
B
01010 1
41 Async. Seminar, Technion, April 18, 2010
Channel
• Four transmission lines (DS-DE)
• High metal layers utilization• Metals 5-8 of 65nm process
• RLC modeled
• Careful layout• Small crosstalk
• Small relative variations
42 Async. Seminar, Technion, April 18, 2010
Channel Issues
• The wires should be designed as transmission lines, enabling multiple traveling signals (a 1mm wire may carry at least two successive bits simultaneously)
• Wire Bandwidth (metal layer/width/frequency)
• Careful Layout
• Modeling (R, L, C)
• Current/Voltage Signaling
• Repeater Insertion / Differential Repeaters
• Shielding
• Termination
• Low-Swing (Speed & Power)
43 Async. Seminar, Technion, April 18, 2010
S S
LEDR Interconnect Layout
44 Async. Seminar, Technion, April 18, 2010
S SP P
LEDR Interconnect Layout
45 Async. Seminar, Technion, April 18, 2010
S SP P SP
LEDR Interconnect Layout
46 Async. Seminar, Technion, April 18, 2010
Differential Channel Driver and Receiver“Voltage Mode”
• Current mode differential low-swing signaling
• Currents in opposite directions
• Controllable current return path
a
a
Driver
SA
i
i
b
o
o
z
z
Receiver
R
RP / S P / S
Voltage Sensing at the RX side
47 Async. Seminar, Technion, April 18, 2010
Channel Characteristic Impedance
0 (1 )DCR j L
Z R R jj C
d
Based on data from BPTM. Drawn for constant R, L, C
• Z depends on F
• Voltage changes with F
• Fast changes voltage drifts
• The drifts bound the operating speed
F
Z
S
S
48 Async. Seminar, Technion, April 18, 2010
Channel Driver with Adaptive Control
IN
OUT
• Compensates for Z changes• Turned on for low frequencies
Adaptive Control
Inertial Delay
49 Async. Seminar, Technion, April 18, 2010
Adaptive Control – Simulation Example
• SPICE simulation setup:• 65nm technology, 4mm range, 67Gbps data rate
• RLC modeled channel (using Raphael-like three-dimensional field solver)
• Adaptive control is turned on only for low frequencies
Data
Adaptive Control
Currents
Low FrequencyTurns Adaptive Control On
50 Async. Seminar, Technion, April 18, 2010
Channel Receiver Amplifier
B
IN
OUT
R
RB
51 Async. Seminar, Technion, April 18, 2010
Voltage Mode Drawbacks
• Voltage signal is degraded by high-capacitance of long interconnects
• Fast full-swing voltage transitions result in high dynamic currents• High power • Cross-talk noise
• Current signaling offers a way of avoiding high voltage swings
• Standard “Current-mode” techniques still measure voltage at the receiver side
52 Async. Seminar, Technion, April 18, 2010
Current Mode:Goals
• Full Current Mode:
• Send current from TX
• Measure current at RX
• Minimal voltage swing over channel
• Current to voltage conversion at RX
• Long range repeater-less channel
• Targeted data cycle 15 ps (67 Gbps) @ 65nm
• Low power
53 Async. Seminar, Technion, April 18, 2010
Current Mode:Full Current-Mode Circuit
a
an
Transmitter
Receiver
q
chan
Receiverqn
chann
ITX
IFBR
IRX
IRX
ITX
ITX > IRX
ITX > IS
Vq_swing=ISRIRX+IS
OutputStage
FS
54 Async. Seminar, Technion, April 18, 2010
Current Mode:Fast Feedback
a
an
Transmitter
Receiver
q
chan
Receiverqn
chann
ITX
IFBR
IRX
55 Async. Seminar, Technion, April 18, 2010
Current Mode:Adaptive Control
M1
M2
o
ona
an
OFF During Fast Operation
ON During Slow Operation
56 Async. Seminar, Technion, April 18, 2010
Current Mode:SPICE Simulations
• 7 mm link (note almost twice longer than VM)
• Full RLC model for interconnect
• Interconnect: 2 signal wires + 2 shields. 2m partitioned wires, 5m spacing
S
DN
D
SN
In-pairspacing
In-pairspacing
Cross-pairspacing
57 Async. Seminar, Technion, April 18, 2010
(a)
(b)
(c)
0.0
1.0
4.0
0.0
0.0
2.0
2.0
Current Mode:Simulations (1)
Input Data
RLX Currents
TX Currents
58 Async. Seminar, Technion, April 18, 2010
(a)
(b)
(c)
0.2
0.4
0.6
2.0
6.0
0.7
1.0
0.9
0.8
0.8
1.0
Current Mode:Simulations (2)
Channel Voltage SwingsInput: 150mV, Output: 50mV
RX Currents: 1mA
RX Output Voltage: 200mV
59 Async. Seminar, Technion, April 18, 2010
Current Mode:Power
7 mm Link, Random Pattern ,67Gbps-max RX-TX
OutputStage Total
IAV dynamic (mA) 9.7 10.3 20.0
I leakage (mA) 0.06 0.07 0.12
I peak (mA) 9.94 13.75 23.7
Dynamic Power (mW, Vdd=1.2V) 11.7 12.3 24.0
Leakage Power (mW, Vdd=1.2V) 0.07 0.08 0.15
Peak Power (mW) 11.93 16.50
Total Power with 20% Utilization (mW) 2.4 2.5 4.9
Total Link Power with 20% Utilization (mW)Including SerDes and two Current-Model Links
- - 35
60 Async. Seminar, Technion, April 18, 2010
Voltage Mode vs. Current Mode
• 16-bit word 67Gbps serial link
• Maximal repeater-less range:• Voltage mode: 4mm• Current mode: 7mm
• Dynamic power (w/o SerDes circuits, 100% util.):• 4mm Voltage mode: 18mW• 7mm Current mode: 24mW
• 33% increase in power for 75% increase in length
61 Async. Seminar, Technion, April 18, 2010
In-Die Variations
• Splitter architecture• High-speed operation localized to input and output stages
• High-speed components design and verification• Monte-Carlo simulations (>5)• 26 PVT Corners• Iterative design with legging and sizing for sensitive
transistors
• Asynchronous structure• Supports any slow down• Minimal time separation between successive bits must be
provided!
62 Async. Seminar, Technion, April 18, 2010
Serial Link Performance Limitations
• Skew between subsequent transitions
• Channel deviation (e.g. short pulses degradation)
• Systematic On-Die Variations (TX vs. RX)
Async. SeminarAsync. Seminar
FOXTest Chip
64 Async. Seminar, Technion, April 18, 2010
• Proof of concept of the serial link and its components on a real process
• Performance mapping: • Operation mode (16-bit/64-bit words)
• Speed (ability to change the link speed in range 0.1-30 Gbps)
• Link length (0-7mm)
• Power
• Channel layout geometry
• Metal layer
• Noise rejection
65 Async. Seminar, Technion, April 18, 2010
Test Ctrl
TestParameters (e.g. Speed, K, etc.)
START
FOX-TX)1(
FOX-TX)2(
FOX-TX(K)
W
W
W
W
WIRES
WIRES
WIRES
FOX-RX)1(
FOX-RX)2(
FOX-RX(K)
Link Architecture (only one is active per a time)
W
W
W
RDEN
LATCH_EN
DOUT
ENDF
Sen
d W
ord[
0:K
-1]
DINWREN
RESET
LATCH_EN
LATCH_EN
CLK
load
_en
[0:K
-1]
W
LATCH_EN
RX_DATA
TX
_DA
TA
66 Async. Seminar, Technion, April 18, 2010
• IBM 65nm 10 LPe process
• 8 Metal layers
• Expecting 30 Gbps throughput over single link• 1 / (30ps gate delay)
67 Async. Seminar, Technion, April 18, 2010
Digital Controler
Testing circuits
TX RX
FOX chip structure
Power grid
68 Async. Seminar, Technion, April 18, 2010
Conclusions
• Novel high-speed serial links outperform parallel links for long range communication
• The serial link is more attractive for shorter ranges in future technologies
• Future large SoCs and NoCs should employ serial links to mitigate: • Area
• Routing Congestion
• Power
• Latency
69 Async. Seminar, Technion, April 18, 2010
Publications
1. R. Dobkin, R. Ginosar, C. Sotiriou, "Data Synchronization Issues in GALS SoCs," Proc. of ASYNC, pp. 170-179, 2004.
2. R. Dobkin, V. Vishnyakov, E. Friedman and R. Ginosar, "An Asynchronous Router for Multiple Service Levels Networks on Chip," Proc. of ASYNC, pp. 44-53, 2005.
3. R. Dobkin, R. Ginosar, A. Kolodny, "High-Speed Serial Interconnect for NoC," NoC Workshop, DATE, 2006.
4. R. Dobkin, R. Ginosar, A. Kolodny, "Fast Asynchronous Shift Register for Bit-Serial Communication," Proc. of ASYNC, pp. 117-126, 2006.
5. R. Dobkin, R. Ginosar and C. P. Sotiriou, "High Rate Data Synchronization in GALS SoCs," IEEE Transactions on VLSI, 14(10), pp. 1063-1074, 2006.
6. R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, A. Kolodny, "High Rate Wave-Pipelined Asynchronous On-Chip Bit-Serial Data Link," Proc. of ASYNC, pp. 3-14, 2007.
7. R. Dobkin, R. Ginosar, I. Cidon, "QNoC Asynchronous Router with Dynamic Virtual Channel Allocation," First ACM/IEEE Int. Symp. on Networks on Chip (NOCS), p. 218, 2007.
8. R. Dobkin, R. Ginosar, A. Kolodny, "QNoC Asynchronous Router", VLSI Integration Journal, 2008, doi:10.1016/j.vlsi.2008.03.001
9. R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, "Parallel vs. Serial On-Chip Communication," SLIP, pp. 43-50, 2008.
10. R. Dobkin, R. Ginosar, "Fast Universal Synchronizers," PATMOS, 2008.
11 . R. Dobkin, T. Kapshitz, S. Flur, R. Ginosar, "Assertion Based Functional Verification of Multiple-Clock GALS Systems," VLSI-SOC, 2008.
12. R. Dobkin, R. Ginosar, "Two Phase Synchronization with Sub-cycle Latency," In press, VLSI Integration Journal, 2008.
Async. SeminarAsync. Seminar
Future Research
71 Async. Seminar, Technion, April 18, 2010
Neighbour Local GlobalSoC Module WrapperRouter
Future SoC: Three-Level Communication
72 Async. Seminar, Technion, April 18, 2010
Future SoC (Zoom In)
Synchronous Module
Wrapper(Synchronization)
NoC Router
Dedicated PTP LinkNoC Link
SoC
73 Async. Seminar, Technion, April 18, 2010
Future Research (1) – Networking
• Design a 2-level NOC with power, latency and bandwidth parameters for each link
• Create algorithms or analysis that help decide for each route whether to send the message on the highway (part of the way) or only using the slow parallel links
• Consider a ring architecture, where each high speed message can either pass-thru or be "dropped" to slow NoC, similar to optical ring switches
• Analyze N-level NoC where each communication level has its own speed tag• Combine / Compare to QNoC approach
• Is there any benefit in sending part of the packet through high-way and the other part through slow parallel links?
• Broadcast / Multicast / Backpressure (Tokens) through highway?
74 Async. Seminar, Technion, April 18, 2010
Future Research (2) – Architecture
• Design a 5-port router for highway links: • Does each router need to parallelize each incoming word before it can be serialized
again and sent on the next link?
• Maybe create a switch using X-latches? Consider wave-pipelining insides the router.
• Design parallel NoC link/NoC architecture with wave pipelining• Rely on guaranteed latency credit mechanism
• Design upper level architecture for the serial link• Word level acknowledge technique
• What should be done for NACK?• Encoding for error correction?
• Retransmit?
• Out of order word (flit) handling?
• Design Low-Power N-bit high-speed shift-register• Explore the linear power reduction of Splitter architecture
75 Async. Seminar, Technion, April 18, 2010
Future Research (3) – Circuit
• High-speed Current-Mode circuits • New approach for lower supply voltage / longer links
• Improve Output Stage stability, resiliency to variations, power
• Analyze trade-off power/speed for CM-based links
• Design an efficient power down/up technique for CM• Switch off the link when there is no data
• Turn on the link quickly when the data is ready
• Adopt the idea of CM for off-chip links• Design a new circuit matching off-chip requirements
• Compare to known approaches
• Further investigation of asynchronous (variable rate) signaling
• Explore interconnect architectures for CM signaling• Layout, shielding, return current path, noise
76 Async. Seminar, Technion, April 18, 2010
The End
• Thank you