High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
1Dept. of ECE, 2Dept. of CSE, UC San Diego, La Jolla, CA
19th Conference on Electrical Performance of Electronic Packaging and SystemsOct 25, 2010 Austin, USA
2
Outline Introduction Equalized On-Chip Global Link
Overall structure Basic working principle
Driver Design for On-Chip Transmission-Line Guideline for tapered CML driver Driver design example
Continuous-Time Linear Equalizer (CTLE) Design CTLE modeling CTLE design example
Driver-Receiver Co-Design for Low Energy per Bit Methodology Overall link design example
Conclusion
Research Motivation Global interconnect planning becomes a challenge in ultra-
deep sub-macron (UDSM) process Performance gap between global wire and logic gates Conventional buffer insertion brings in larger extra power overhead
Uninterrupted wire configurations are used to tackle the on-chip global communication issues On-chip T-lines to reduce interconnect power Equalization to improve the bandwidth State-of-the-art[Kim2009]
2Gb/s/um, < 1pJ/b, signaling over 10mm global wire in 90nm
3
Our Contributions Contributions
Build up a novel equalized on-chip T-line structure for global communication
Tapered CML driver + CTLE receiver Accurate small-signal modeling on CTLE receiver to improve the
optimization quality A design methodology to achieve driver-wire-receiver co-
optimization to reduce the total energy per bit Results of our design
20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy per bit in 45nm
4
Equalized On-Chip Global Link
5
Overall structure Tapered current-mode logic (CML) drivers Terminated differential on-chip T-line Continuous-time linear equalizer (CTLE) receiver Sense-amplifier based latch
Basic Working Principle Tapered CML Driver
Provide low-swing differential signals to driver T-line Tapered factor u, number of stages N, fan-out X, final stage current ISS,
driver resistance RS
T-line Differential wire w/ P/G shielding Geometries (width, pitch) and termination resistance RT
CTLE Receiver Recover signal and improve eye-quality Load resistance RL, source degeneration resistance RD and capacitance
CD, over-drive voltage Vod. Sense-amplifier based latch
Synchronize and convert signal back to digital level
6
Tapered CML Driver Design Output swing constraint
Design guideline [Tsuchiya2006, Heydari2004]
Begin from the final stage For given VSW, output resistance RS optimized with RT
to increase eye-opening Transistor size
Tapered factor u = 2.7 for delay reduction Number of stages
Each previous stage is designed backward by scaling with the factor u
7
Need to design:1) Output resistance RS
2) Tail current ISS
3) Size of transistors W
CML Driver Study w/ Loaded T-line
8
Assume 45nm 1P11M CMOST-line built on M9 with M1 as referenceT = 1.2um, H = 3.5um (fixed)Optimize W and S for eye-opening
Change of the eye-opening with width for fixed 2um pitch
Change of the eye-opening with pitch for equal width/spacing
CML Driver Design Example Experimental observations
Optimal eye happens when width=spacing Eye-opening improves with larger pitch
Design methodology Choose the minimum pitch that satisfied the wire-end eye-opening
requirement Design example
9
Accurate CTLE Modeling
10
voutvinG
S
D
RD CD
rds CLRLgmvgs
Small Signal Circuit to derive H(s):
2
1
2
1( )1
( 1)( ) ( 1)
( 1)
( 1)1
1/
/
D DDC
m ds LDC
m ds D ds L
ds L L D D m ds D L L L D D
m ds D ds L
ds D D L L
m ds D ds L
zD D
p
p
sR CH s Gainas bsg r RGain
g r R r Rr R C R C g r R R C R R Ca
g r R r Rr R C R Cb
g r R r R
R Ca
a b
1.2
( ), ( ), ( )21, , ,
1.5fF/um , 1.5fF/um
,
od od od
Bias dd ic Biasm ds
od Bias L od
para paraS D
ex para ex paraD D S L L D
V V K K VI V V IWg r IbiasV I R L KV
C W C W
C C C C C C
Design Variables: RL, RD, CD, Vod(Size)
[Hanumolu2005]
CTLE Modeling Validation
Test case:10mm, 16mV-eye@wire-end Blue lines: simple modeling, not consider rds and parasitics Red line: only consider rds
Black line: the proposed accurate model11
<10% correlation error>20% eye-opening increase
CTLE Design Example Observations of CTLE study
Eye-opening improves with relaxed power constraints but tends to be saturated
Design example Based on the pre-optimized CML driver + T-line design Eye-opening improved by 4X after CTLE
12
Driver-Receiver Co-Design Methodology
Optimize driver-wire-receiver together by setting Veye/Power as the cost function
Choose pre-designed CML/T-line/CTLE as initial solution Optimization Flow
Driver-to-receiver step-response generation based on SPICE simulation and CTLE modeling
Eye-opening estimation based on step-response SQP-based non-linear optimization Variables: [ISS,RT,RL,RD,CD,Vod]
Performance Comparison Option A:Driver/Receiver independent design Option B:Low-power driver/receiver co-design
13
Low Energy-per-Bit Optimization Flow
14
Pre-designed CML driver Pre-designed CTLE receiver
Driver-Receiver Co-Design Initial Solution
Co-Design Cost Function Estimation
SPICE generated T-line step response
Step-Response Based Eye Estimation
Receiver Step-Response using CTLE modeling
Internal SQP (Sequential Quadratic Optimization) routine to generate best solution
Best set of design variables in terms of overall energy-per-bit
Change variables[ISS,RT,RL,RD,CD,Vod]
Cost-FunctionVeye/Power
Simulated Eye Diagrams
15
Methodology A: driver/receiver separate design
Methodology B: driver/receiver co-design for low-power
Summary of Performance ComparisonMethodology Adriver/receiver separate design
Methodology Bdriver/receiver co-design for low-power
RS/ohm 47 148
RT/ohm 94 1100
RL/ohm 440 890
RD/ohm 110 1430
CD/fF 680 150
Vod/mV 60 58
Eye-Opening@CTLE/mV 91 113
Power Consumption/mW 8.1 3.8
16
Note: driver/receiver co-design methodology uses much larger driver/termination resistance to reduce power, but will close the eye-opening at the driver output and wire-end. Final eye is recovered by fully utilizing CTLE.
Conclusion We propose a novel equalized on-chip global link
using CML driver and CTLE receiver Accurate modeling for CTLE is provided to achieve
<10% correlation error and will improve eye-opening optimization quality
Our design achieves 20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy
17
Thank You!Q & A
18