async. seminar fox fast on-chip interconnect reuven dobkin technion – israel institute of...

76
Async. Async. Seminar Seminar FOX FOX Fast On-Chip Interconnect Fast On-Chip Interconnect Reuven Dobkin Reuven Dobkin Technion – Israel Institute of Technology Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab Electrical Engineering Department – VLSI Lab April 18, April 18, 2010 2010

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. SeminarAsync. Seminar

FOXFOXFast On-Chip InterconnectFast On-Chip Interconnect

Reuven DobkinReuven Dobkin

Technion – Israel Institute of TechnologyTechnion – Israel Institute of Technology

Electrical Engineering Department – VLSI LabElectrical Engineering Department – VLSI Lab

April 18, 2010April 18, 2010

Page 2: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

2 Async. Seminar, Technion, April 18, 2010

Presentation Outline

• SoC Communication Challenges• Serial Link Benefits

• Parallel vs. Serial Link Analysis• Fast Asynchronous Serial Link

• Transmitter, Fast LEDR Encoder• Receiver, Fast Toggle Circuit• Channel

• Voltage Mode Async Signaling• Current Mode Async Signaling

• FOX Test Chip• Future Research Topics

Page 3: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

3 Async. Seminar, Technion, April 18, 2010

System-on-Chip (SoC) Challenges

• Non-scalable global interconnect• High power and latency

• Multiple number of modules• Congested inter-modular communication

• Multiple clock and voltage domains

• Power-reduction dynamic techniques• Complex inter-module timing relations

• Growing bandwidth requirements• Unsupported by conventional buses

Page 4: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

4 Async. Seminar, Technion, April 18, 2010

• Constructed of multiple wires and repeaters• Incur high leakage power (line drivers and repeaters)• Often have low utilization (leakage again)• Occupy large chip area (routing difficulty)• Incur cross-coupling (delay matching & skew problems)• Present a significant capacitive load

N

Parallel NoC Links and SoC Buses

Page 5: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

5 Async. Seminar, Technion, April 18, 2010

Bit-Serial Interconnect

• Fewer lines, fewer line drivers and fewer repeaters

• Reduced leakage power

• Reduced chip area

• Better routability

• Reduced coupling

• Should work N times faster!

FSER=NFPAR

Page 6: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

6 Async. Seminar, Technion, April 18, 2010

Fast Serial Interconnect

• Synchronous Approach:• Source-synchronous, with Clock Data Recovery (CDR)

• Problem: CDR power and area

• Asynchronous Approach: • Normal methods are too slow

• We consider a faster one

+-

+-

+-

+-

DLL- or PLL-Based Clock

Recovery

Page 7: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

7 Async. Seminar, Technion, April 18, 2010

Seeking Highest Speed

• No “spacers” (NRZ)

• One transition per bit

• Fast signaling • Targeted Speed: One gate delay between bits

• For 65nm technology this results in 67 Gbps

One Gate Delay =

FO4 INV Delay

Page 8: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

8 Async. Seminar, Technion, April 18, 2010

Serial Link – Top Structure

• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)

• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers

Sender Receiver

Word Ack

Bit-Serial ChannelSynchr. Synchr.Serializer

& LEDREncoder

DeSerializer& LEDRDecoder

P

S

Page 9: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

9 Async. Seminar, Technion, April 18, 2010

Encoding –Two Phase NRZ LEDR

• Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)

Uncoded (B)

State bit (S)

Phase bit (P)

0 0 1 1 0 0 0 0 1 0

( ),( )

( ),

B i i oddP i

B i i even

( ) ( )S i B i i

Page 10: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

10 Async. Seminar, Technion, April 18, 2010

Serial Link Applications

• P2P long-range global interconnect

• Long range NoC links

• Pin-limited on-chip module interfaces• Presently chips are pin-limited, and that will migrate

inside

• Cross-bar • Simpler routing and congestion

• Communications inside many-core CMPs

Page 11: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

11 Async. Seminar, Technion, April 18, 2010

Name Data Cycle(# of FO4 Inv. Delays)

WAFT 7* (3 Gbps at 180nm)

IIP 3* (5 Gbps at 250nm)

3-Levels 3* (1 Gbps at 1200nm)

Pulsed CM 2 (8 Gbps at 180nm)

This Work 1 (67 Gbps at 65nm-HP)

Related Work

Page 12: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. SeminarAsync. Seminar

Parallel vs. Serial Link Analysis

Page 13: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

13 Async. Seminar, Technion, April 18, 2010

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

180 130 90 65 30 15

Parallel Link dissipates less power

Serial Link dissipates less power

Lin

k L

engt

h [m

m]

Serial Link BenefitsAnalysis goal

• To show that novel serial link outperforms the parallel one for:• Long ranges

• Advanced technology nodes

Technology Node [nm]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

180 130 90 65 30 15

Parallel Link requires less area

Serial Link requires less area

Technology Node [nm]

Page 14: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

14 Async. Seminar, Technion, April 18, 2010

Method

• Choose • Parallel link implementation representatives

• Serial link implementation representatives

• Compare the parallel and serial link approaches in terms of:• Area

• Power

• Latency

• Technology scaling

Page 15: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

15 Async. Seminar, Technion, April 18, 2010

"Register-Pipelined" Parallel Link

1

2

N

CLKT

1

2

N

CLKR

1

2

N

• Fully synchronous• Interconnect as combinational logic between registers• Source synchronous or global clock

High cost for high bit rates!

Page 16: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

16 Async. Seminar, Technion, April 18, 2010

"Wave-Pipelined" Parallel Link

1

2

N

CLKRRepeater Stage

1

2

N

CLKT

WORDN(i+1) WORDN(i)

Bit rate is limited by relative skew of the link wires

Page 17: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

17 Async. Seminar, Technion, April 18, 2010

Crosstalk Mitigation andPower Reduction

• Shielding / Spacing• Staggered repeaters• Interleaved bi-directional lines• Asynchronous signaling• Data encoding • Data pattern recognition with special worst-case

handling

• This work analyzes the two extremes of shielding:• Unshielded wires (a)• Fully-shielded wires (b) (a) (b)

Page 18: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

18 Async. Seminar, Technion, April 18, 2010

Single Gate-Delay Serial Link (reminder)

• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)

• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers

Sender Receiver

Word Ack

Bit-Serial ChannelSynchr. Synchr.Serializer

& LEDREncoder

DeSerializer& LEDRDecoder

P

S

Page 19: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

19 Async. Seminar, Technion, April 18, 2010

Analytical Models

Parallel and Serial Link Bit Rates

Please refer to the paper below for details on the exact analytical models employed in the work:

R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, "Parallel vs. Serial On-Chip Communication," SLIP, pp. 43-50, 2008

Page 20: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

20 Async. Seminar, Technion, April 18, 2010

Parallel Link Bit Rate Limitations (1)

A. Fastest available clock• Ring oscillator limitation: 8d4

• Fast processors: 11d4 (e.g. CELL)• Standard SoC/ASIC: 100-400d4

B. Synchronization Latency• May take several clocks in case of asynchronous

clock relation

C. Clock uncertainty• Extended critical path

RESET

CLK

DOUT

SOURCE

LEAF

Page 21: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

21 Async. Seminar, Technion, April 18, 2010

Parallel Link Bit Rate Limitations (2)

D. Delay Uncertainty• The skew and jitter of the clock• Repeater delay variations • Wire delay variations

mostly metal thickness variations

• Via variations• Cross-Coupling (Crosstalk)• Geometry

Outcome of routing congestion and multi-layer structure

N.S. Nagaraj DAC 2005 / L. Scheffer, SLIP 2006

Page 22: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

22 Async. Seminar, Technion, April 18, 2010

Parallel Link Minimal Clock Cycle (1)

LinkLength

TSU+2DCLK TH+2DCLK

Maximal Data Delay, dMAX

Minimal Data Delay, dMIN

CLKT

DCLK

Notations from W.P. Burleson, et al., Wave-Pipelining: A Tutorial and Research Survey, TVLSI, 1998

2 ( ) 4CLK MAX MIN CLK SU HT T Td d D

Latest data clocking

Earliest data clocking

Clock Uncertainty

Page 23: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

23 Async. Seminar, Technion, April 18, 2010

Impact of Process Variations in Repeaters on Multi-Wire Delay Uncertainty

• Variation types• Random variations

closely placed devices

• "Systematic" variations location on the die

• Relative skew (dMAX–dMIN)• Repeaters in the same stage are highly

correlated• Random variations are averaged out

thanks to large repeater sizing• Systematic inter-stage variations are

averaged out along the link

Repeater Stage

1

2

N

CLKT

Random variations inside Repeater Stage

are averaged out

Relative skew among the lines due to variations in repeaters is small Multi-wire delay uncertainty is dominated by Cross-Coupling

Page 24: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

24 Async. Seminar, Technion, April 18, 2010

Parallel Link Minimal Clock Cycle (2)

• Minimal clock cycle:

• System clock limitation:

• Register-pipelined link:

• Distance between successive pipeline stages is affected by Delay Uncertainty

2 ( ) 4CLK CLK SU HT L T T D

65nm example

The rate is bounded by clock cycle rather than the

delay uncertainty

The rate is bounded by clock cycle rather than by

the delay uncertainty

max{2 ( ) 4

,

}

PARCLK CLK

SU H

SYSTEM CLOCK

T L

T T

T

D

PARCLK SYSTEM CLOCKT T

Worst case skew between two lines: Cross-coupling

and wire variations

Page 25: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

25 Async. Seminar, Technion, April 18, 2010

Serial Link Bit Rate

• Skew due to transistor variations is neglected• much smaller than in parallel link

• Coupling factor is always known• LEDR encoding: there is only one transition

per each transmitted bit• The skew is not affected by cross-coupling

link delay is similar for all symbols

• Bit rate:

41 /SERB d

Page 26: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

26 Async. Seminar, Technion, April 18, 2010

Scalability

Number of repeaters (per millimeter) grows for more advanced technology nodes

Active area and leakage: Minimal link length for serial link employment decreases with technology

Dynamic power: Minimal link length for serial link employment decreases with technology

Interconnect area: Serial link is always preferable

0

0.5

1

1.5

2

180 130 90 65 45 32

Technology Node [nm]

k=

K/m

m

0.0

2.0

4.0

6.0

8.0

10.0

Lm

in [

mm

]

Y.I. Ismail, et al., Repeater Insertion in RLC Lines for Minimum Propagation Delay, ISCAS99

Power

Repeaters

Number of repeaters grows with technology

node scaling

Area and Leakage

Equal throughput Parallel and Serial links

are assumed

Page 27: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

27 Async. Seminar, Technion, April 18, 2010

65 nm Test Case Summary

Wave-Pipeline vs. Serial Register-pipelined vs. Serial

Shielding Fully Shielded Unshielded Fully Shielded Unshielded

Length of parallel link

unlimited up to 6mm unlimited unlimited

Clock cycle of parallel link

8d4 8d4

10d4

(fast)

130d4

(slow)

10d4

(fast)

130d4

(slow)

To minimize the following:

choose a serial link for links longer than:

Area Always Always Always Always

Power 2 mm 4mm 3mm 3mm 1mm 3mm

Latency 2 mm Never* 4mm 12mm 2mm 9mm

Minimal length above which the serial link is preferred

Page 28: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. SeminarAsync. Seminar

Single Gate Delay Serial Link

Architecture

Page 29: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

29 Async. Seminar, Technion, April 18, 2010

Single Gate-Delay Serial Link

• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)

• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers

Sender Receiver

Word Ack

Bit-Serial ChannelSynchr. Synchr.Serializer

& LEDREncoder

DeSerializer& LEDRDecoder

P

S

Page 30: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

30 Async. Seminar, Technion, April 18, 2010

Transmitter – Fast SR Approach

Transition

Generator

P

P

Parallel Load Interface

Load Enable

Uncoded Data

Shift-Register, SR(B) Beven

T0T0

Bodd

S

S

LEDREncoder

P

P

S

S

T90T90

Bodd

Beven

OT0

OT0

OT90

OT90

• Targeted Speed: One gate delay between bits

Page 31: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

31 Async. Seminar, Technion, April 18, 2010

Multi-Phase Transition Generation

SA SA SA SA

clk ]3[ clk ]7[ clk ]2[ clk ]6[ clk ]1[ clk ]5[ clk ]0[ clk ]4[

A

B

RESET

RESET

C1

C2

C3

C4

C11=C1 xor C3

C22=C2 xor C4

T=C11 xor C22

C1

C2

C3

C4

C11

C22

T

(a) (b)

T0

T90

The circuit is based on multi-phase clock generator from M.J.E. Lee, "An Efficient I/O and Clock Recovery for TERABIT Integrated Circuits Design," PhD Thesis, Stanford Univ., 2001.

Page 32: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. Seminar, Technion, 18 April 201032

T

XL[1] XL[0]

Data-0Data10

XL[W-2]

Data(W-1)(W-2)

Parallel Load Interface

NotConnected

OUT

Fast Asynchronous Shift Register

Page 33: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

33 Async. Seminar, Technion, April 18, 2010

-25

-20

-15

-10

-5

0

5

1.00E+09 1.00E+10 1.00E+11Frequency [Hz]

DVb=0DVb=0.05DVb=0.1DVb=0.15DVb=0.2DVb=0.25DVb=0.3

-8

-6

-4

-2

0

2

4

6

8

1.00E+09 1.00E+10 1.00E+11Frequency [Hz]

0.5 Full Swing0.6 Full Swing0.7 Full Swing0.8 Full Swing0.9 Full SwingFull Swing

BiasGain[dB]

Voltage SwingGain[dB]

Wave-pipelined Control Characteristics

• The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram

• This operating point results in signal degradation along the inverter chain

Single Gate Delay Rate

BIAS

SWING

25

30

35

40

45

50

55

60

65

0 5 10 15 20 25 30

Inverter Chain Length, N

Cut-off Frequency[GHz]

Page 34: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

34 Async. Seminar, Technion, April 18, 2010

Splitter Architecture

• The shift-register is partitioned into M shift-registers M slower operation in each shift-register

• Signal is no longer degraded • Single gate-delay operation is localized to output (input) stage only

Shift-Register for Odd Bits

Shift-Register for Even BitsMerge

PARALLEL LOAD

Shift-Register for Odd Bits

Shift-Register for Even BitsSplit

PARALLEL LOAD PARALLEL READ

PARALLEL READ

Transmitter Receiver

Page 35: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

35 Async. Seminar, Technion, April 18, 2010

Splitter Architecture – Performance

• Splitter architecture employment• Better resiliency to process variation thanks to

localized high-speed operation• Power Reduction:

2

2 1

NON SPLITWP CTRL BUF SER

SPLITTER NON SPLITSERWP CTRL BUF WP CTRL

P N C V F

FNP M C V P

M M M

Page 36: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

36

Transmitter Splitter Architecture

XLEVEN[N/2]

C90

XLODD[N/2]

C

XOR

P

BODD

BEVEN

P

S

S

BODD

BEVEN

LEDR ENCODER

MergeEVEN

MergeODD

SREVEN

SRODD

Page 37: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

37 Async. Seminar, Technion, April 18, 2010

Transmitter – SPICE Simulation (65nm node)

30 ps

C0

C90

BEVEN

BODD

C

P

S

60 ps

START BIT=1

1 0 0 0 1 0 1 0

0 0 1 0 1 0 1 Dummy

1 0 0 1 0 1 0 11 0 0 0 1 0 1 0

Simulations done at

Page 38: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. Seminar, Technion, 18 April 201038

Receiver

Page 39: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

39

Receiver Splitter Architecture

XLEVEN[1]

XLODD[1]

SREVEN

SRODD

TOGGLEC

S

A

B

T

SEVEN

SODD

SPLIT

Page 40: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

40 Async. Seminar, Technion, April 18, 2010

Toggle Circuit

• Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle)

• Novel toggle: • Single gate delay operation support• Internal and output latches

T

A

B

01010 1

Page 41: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

41 Async. Seminar, Technion, April 18, 2010

Channel

• Four transmission lines (DS-DE)

• High metal layers utilization• Metals 5-8 of 65nm process

• RLC modeled

• Careful layout• Small crosstalk

• Small relative variations

Page 42: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

42 Async. Seminar, Technion, April 18, 2010

Channel Issues

• The wires should be designed as transmission lines, enabling multiple traveling signals (a 1mm wire may carry at least two successive bits simultaneously)

• Wire Bandwidth (metal layer/width/frequency)

• Careful Layout

• Modeling (R, L, C)

• Current/Voltage Signaling

• Repeater Insertion / Differential Repeaters

• Shielding

• Termination

• Low-Swing (Speed & Power)

Page 43: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

43 Async. Seminar, Technion, April 18, 2010

S S

LEDR Interconnect Layout

Page 44: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

44 Async. Seminar, Technion, April 18, 2010

S SP P

LEDR Interconnect Layout

Page 45: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

45 Async. Seminar, Technion, April 18, 2010

S SP P SP

LEDR Interconnect Layout

Page 46: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

46 Async. Seminar, Technion, April 18, 2010

Differential Channel Driver and Receiver“Voltage Mode”

• Current mode differential low-swing signaling

• Currents in opposite directions

• Controllable current return path

a

a

Driver

SA

i

i

b

o

o

z

z

Receiver

R

RP / S P / S

Voltage Sensing at the RX side

Page 47: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

47 Async. Seminar, Technion, April 18, 2010

Channel Characteristic Impedance

0 (1 )DCR j L

Z R R jj C

d

Based on data from BPTM. Drawn for constant R, L, C

• Z depends on F

• Voltage changes with F

• Fast changes voltage drifts

• The drifts bound the operating speed

F

Z

S

S

Page 48: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

48 Async. Seminar, Technion, April 18, 2010

Channel Driver with Adaptive Control

IN

OUT

• Compensates for Z changes• Turned on for low frequencies

Adaptive Control

Inertial Delay

Page 49: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

49 Async. Seminar, Technion, April 18, 2010

Adaptive Control – Simulation Example

• SPICE simulation setup:• 65nm technology, 4mm range, 67Gbps data rate

• RLC modeled channel (using Raphael-like three-dimensional field solver)

• Adaptive control is turned on only for low frequencies

Data

Adaptive Control

Currents

Low FrequencyTurns Adaptive Control On

Page 50: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

50 Async. Seminar, Technion, April 18, 2010

Channel Receiver Amplifier

B

IN

OUT

R

RB

Page 51: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

51 Async. Seminar, Technion, April 18, 2010

Voltage Mode Drawbacks

• Voltage signal is degraded by high-capacitance of long interconnects

• Fast full-swing voltage transitions result in high dynamic currents• High power • Cross-talk noise

• Current signaling offers a way of avoiding high voltage swings

• Standard “Current-mode” techniques still measure voltage at the receiver side

Page 52: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

52 Async. Seminar, Technion, April 18, 2010

Current Mode:Goals

• Full Current Mode:

• Send current from TX

• Measure current at RX

• Minimal voltage swing over channel

• Current to voltage conversion at RX

• Long range repeater-less channel

• Targeted data cycle 15 ps (67 Gbps) @ 65nm

• Low power

Page 53: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

53 Async. Seminar, Technion, April 18, 2010

Current Mode:Full Current-Mode Circuit

a

an

Transmitter

Receiver

q

chan

Receiverqn

chann

ITX

IFBR

IRX

IRX

ITX

ITX > IRX

ITX > IS

Vq_swing=ISRIRX+IS

OutputStage

FS

Page 54: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

54 Async. Seminar, Technion, April 18, 2010

Current Mode:Fast Feedback

a

an

Transmitter

Receiver

q

chan

Receiverqn

chann

ITX

IFBR

IRX

Page 55: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

55 Async. Seminar, Technion, April 18, 2010

Current Mode:Adaptive Control

M1

M2

o

ona

an

OFF During Fast Operation

ON During Slow Operation

Page 56: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

56 Async. Seminar, Technion, April 18, 2010

Current Mode:SPICE Simulations

• 7 mm link (note almost twice longer than VM)

• Full RLC model for interconnect

• Interconnect: 2 signal wires + 2 shields. 2m partitioned wires, 5m spacing

S

DN

D

SN

In-pairspacing

In-pairspacing

Cross-pairspacing

Page 57: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

57 Async. Seminar, Technion, April 18, 2010

(a)

(b)

(c)

0.0

1.0

4.0

0.0

0.0

2.0

2.0

Current Mode:Simulations (1)

Input Data

RLX Currents

TX Currents

Page 58: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

58 Async. Seminar, Technion, April 18, 2010

(a)

(b)

(c)

0.2

0.4

0.6

2.0

6.0

0.7

1.0

0.9

0.8

0.8

1.0

Current Mode:Simulations (2)

Channel Voltage SwingsInput: 150mV, Output: 50mV

RX Currents: 1mA

RX Output Voltage: 200mV

Page 59: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

59 Async. Seminar, Technion, April 18, 2010

Current Mode:Power

7 mm Link, Random Pattern ,67Gbps-max RX-TX

OutputStage Total

IAV dynamic (mA) 9.7 10.3 20.0

I leakage (mA) 0.06 0.07 0.12

I peak (mA) 9.94 13.75 23.7

Dynamic Power (mW, Vdd=1.2V) 11.7 12.3 24.0

Leakage Power (mW, Vdd=1.2V) 0.07 0.08 0.15

Peak Power (mW) 11.93 16.50

Total Power with 20% Utilization (mW) 2.4 2.5 4.9

Total Link Power with 20% Utilization (mW)Including SerDes and two Current-Model Links

- - 35

Page 60: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

60 Async. Seminar, Technion, April 18, 2010

Voltage Mode vs. Current Mode

• 16-bit word 67Gbps serial link

• Maximal repeater-less range:• Voltage mode: 4mm• Current mode: 7mm

• Dynamic power (w/o SerDes circuits, 100% util.):• 4mm Voltage mode: 18mW• 7mm Current mode: 24mW

• 33% increase in power for 75% increase in length

Page 61: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

61 Async. Seminar, Technion, April 18, 2010

In-Die Variations

• Splitter architecture• High-speed operation localized to input and output stages

• High-speed components design and verification• Monte-Carlo simulations (>5)• 26 PVT Corners• Iterative design with legging and sizing for sensitive

transistors

• Asynchronous structure• Supports any slow down• Minimal time separation between successive bits must be

provided!

Page 62: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

62 Async. Seminar, Technion, April 18, 2010

Serial Link Performance Limitations

• Skew between subsequent transitions

• Channel deviation (e.g. short pulses degradation)

• Systematic On-Die Variations (TX vs. RX)

Page 63: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. SeminarAsync. Seminar

FOXTest Chip

Page 64: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

64 Async. Seminar, Technion, April 18, 2010

• Proof of concept of the serial link and its components on a real process

• Performance mapping: • Operation mode (16-bit/64-bit words)

• Speed (ability to change the link speed in range 0.1-30 Gbps)

• Link length (0-7mm)

• Power

• Channel layout geometry

• Metal layer

• Noise rejection

Page 65: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

65 Async. Seminar, Technion, April 18, 2010

Test Ctrl

TestParameters (e.g. Speed, K, etc.)

START

FOX-TX)1(

FOX-TX)2(

FOX-TX(K)

W

W

W

W

WIRES

WIRES

WIRES

FOX-RX)1(

FOX-RX)2(

FOX-RX(K)

Link Architecture (only one is active per a time)

W

W

W

RDEN

LATCH_EN

DOUT

ENDF

Sen

d W

ord[

0:K

-1]

DINWREN

RESET

LATCH_EN

LATCH_EN

CLK

load

_en

[0:K

-1]

W

LATCH_EN

RX_DATA

TX

_DA

TA

Page 66: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

66 Async. Seminar, Technion, April 18, 2010

• IBM 65nm 10 LPe process

• 8 Metal layers

• Expecting 30 Gbps throughput over single link• 1 / (30ps gate delay)

Page 67: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

67 Async. Seminar, Technion, April 18, 2010

Digital Controler

Testing circuits

TX RX

FOX chip structure

Power grid

Page 68: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

68 Async. Seminar, Technion, April 18, 2010

Conclusions

• Novel high-speed serial links outperform parallel links for long range communication

• The serial link is more attractive for shorter ranges in future technologies

• Future large SoCs and NoCs should employ serial links to mitigate: • Area

• Routing Congestion

• Power

• Latency

Page 69: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

69 Async. Seminar, Technion, April 18, 2010

Publications

1. R. Dobkin, R. Ginosar, C. Sotiriou, "Data Synchronization Issues in GALS SoCs," Proc. of ASYNC, pp. 170-179, 2004.

2. R. Dobkin, V. Vishnyakov, E. Friedman and R. Ginosar, "An Asynchronous Router for Multiple Service Levels Networks on Chip," Proc. of ASYNC, pp. 44-53, 2005.

3. R. Dobkin, R. Ginosar, A. Kolodny, "High-Speed Serial Interconnect for NoC," NoC Workshop, DATE, 2006.

4. R. Dobkin, R. Ginosar, A. Kolodny, "Fast Asynchronous Shift Register for Bit-Serial Communication," Proc. of ASYNC, pp. 117-126, 2006.

5. R. Dobkin, R. Ginosar and C. P. Sotiriou, "High Rate Data Synchronization in GALS SoCs," IEEE Transactions on VLSI, 14(10), pp. 1063-1074, 2006.

6. R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, A. Kolodny, "High Rate Wave-Pipelined Asynchronous On-Chip Bit-Serial Data Link," Proc. of ASYNC, pp. 3-14, 2007.

7. R. Dobkin, R. Ginosar, I. Cidon, "QNoC Asynchronous Router with Dynamic Virtual Channel Allocation," First ACM/IEEE Int. Symp. on Networks on Chip (NOCS), p. 218, 2007.

8. R. Dobkin, R. Ginosar, A. Kolodny, "QNoC Asynchronous Router", VLSI Integration Journal, 2008, doi:10.1016/j.vlsi.2008.03.001

9. R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, "Parallel vs. Serial On-Chip Communication," SLIP, pp. 43-50, 2008.

10. R. Dobkin, R. Ginosar, "Fast Universal Synchronizers," PATMOS, 2008.

11 . R. Dobkin, T. Kapshitz, S. Flur, R. Ginosar, "Assertion Based Functional Verification of Multiple-Clock GALS Systems," VLSI-SOC, 2008.

12. R. Dobkin, R. Ginosar, "Two Phase Synchronization with Sub-cycle Latency," In press, VLSI Integration Journal, 2008.

Page 70: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

Async. SeminarAsync. Seminar

Future Research

Page 71: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

71 Async. Seminar, Technion, April 18, 2010

Neighbour Local GlobalSoC Module WrapperRouter

Future SoC: Three-Level Communication

Page 72: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

72 Async. Seminar, Technion, April 18, 2010

Future SoC (Zoom In)

Synchronous Module

Wrapper(Synchronization)

NoC Router

Dedicated PTP LinkNoC Link

SoC

Page 73: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

73 Async. Seminar, Technion, April 18, 2010

Future Research (1) – Networking

• Design a 2-level NOC with power, latency and bandwidth parameters for each link

• Create algorithms or analysis that help decide for each route whether to send the message on the highway (part of the way) or only using the slow parallel links

• Consider a ring architecture, where each high speed message can either pass-thru or be "dropped" to slow NoC, similar to optical ring switches

• Analyze N-level NoC where each communication level has its own speed tag• Combine / Compare to QNoC approach

• Is there any benefit in sending part of the packet through high-way and the other part through slow parallel links?

• Broadcast / Multicast / Backpressure (Tokens) through highway?

Page 74: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

74 Async. Seminar, Technion, April 18, 2010

Future Research (2) – Architecture

• Design a 5-port router for highway links: • Does each router need to parallelize each incoming word before it can be serialized

again and sent on the next link?

• Maybe create a switch using X-latches? Consider wave-pipelining insides the router.

• Design parallel NoC link/NoC architecture with wave pipelining• Rely on guaranteed latency credit mechanism

• Design upper level architecture for the serial link• Word level acknowledge technique

• What should be done for NACK?• Encoding for error correction?

• Retransmit?

• Out of order word (flit) handling?

• Design Low-Power N-bit high-speed shift-register• Explore the linear power reduction of Splitter architecture

Page 75: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

75 Async. Seminar, Technion, April 18, 2010

Future Research (3) – Circuit

• High-speed Current-Mode circuits • New approach for lower supply voltage / longer links

• Improve Output Stage stability, resiliency to variations, power

• Analyze trade-off power/speed for CM-based links

• Design an efficient power down/up technique for CM• Switch off the link when there is no data

• Turn on the link quickly when the data is ready

• Adopt the idea of CM for off-chip links• Design a new circuit matching off-chip requirements

• Compare to known approaches

• Further investigation of asynchronous (variable rate) signaling

• Explore interconnect architectures for CM signaling• Layout, shielding, return current path, noise

Page 76: Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April

76 Async. Seminar, Technion, April 18, 2010

The End

• Thank you