Download - ECE 679: Digital Systems Engineering Patrick Chiang Office Hours: 1-2PM Mon-Thurs GLSN 100
ECE 679: Digital Systems Engineering
Patrick Chiang
Office Hours: 1-2PM Mon-Thurs
GLSN 100
Class Introductions
• Who am I
• Who are you
Class Basics
• Class basics– 4 Homeworks (%20) (groups of 2)– Midterm (%40)– Final Project (%40)
• 4-page IEEE report• 10 minute presentation (groups of 2)
• Guest lecture (Dr. Frank O’Mahony)– Intel Research Labs (May 4th)– Intel Field Trip (June 7th) TBD
• Presentations of 1-2 best project reports
Class Homework• Homework
– Skim Dally/Poulton “Digital Systems Engineering”• Chapter 3
– Skim Overview Paper: http://mos.stanford.edu/papers/mh_micro_98.pdf
– Includes running Stat Eye• Oregon State Matlab (eecs.oregonstate.edu/it)• www.stateye.org
– Problem Set #1• rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc)• Spice models -- ~pchiang/hspice/process_files/
– 130nm to 22nm– Simulator lang = spice
• Spectre models – DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090
What does this mean for analog designers?
• Ever build an ADC?– Ever wonder what to do with the digital bits?
Analog
Fs = 600MHz
8-16 bits@ 100MHz, 200MHz, 400MHzGoes to Vector analyzer
• Why does this clock rate not increase?
• What really is this output doing? Whereis it going?
Brief Summary
• Introduction to the area– Why serial links are important– What are the current technology
trends/limitations
4Gb/s Low Power, Area Efficient Serial Links
• Interconnection betweendifferent chips
• Transmitter Equalization
• Receiver Offset Cancellation
2000 0.25um Testchip 2001 0.25um Testchip
Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits Symposium, Kyoto, Japan, June 2001, pp. 149-152.
Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp. 1591-1599.
TransmitterOutput
Router Backplane(1m, FR4)
ReceiverInput
CPU CPU
Me
mo
ry
From/to othersubsystems
(e.g. backplane)
High-speed I/Os
IBM Processor
Scaling Serial Links:From 4Gb/s->20Gb/s
• Thesis: Develop 20Gb/s Serial Link– Area: 500um x 500um– Power: 200mW/link
• 1 bit time = 1FO4
• Timing uncertainty becomes KEY issue
t
250ps
v
4Gb/s Eye Diagram
t
50ps
v
20Gb/s Eye Diagram
Transmitter Block Diagram
No post-PLLClock Buffers
Test Chip
Clock Recovery
RXPRBS Check
PRBS Gen
TXDLL
Test Interface
• UMC 1.2V, 0.13um CMOS(single Vt)• Die size 700um x 1.15mm• 50 Ohm Pad Termination using Wafer Probes
700um
1.1mm
10GHz
PLL
TransmitterMuxing
PhaseInterpolator
s
TestStructures
PLL Measurements
Open Loop VCO Phase Noise @ 1MHz -97dBc/Hz
10GHz Jitter (RMS) 0.97ps
10GHz Jitter(pk-pk) 8.0ps
PLL Power 38.6mW
VCO Power 6mW
Tuning Range 1.14-1.31
• Jitter limited by 1.25GHz input reference clock– HP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)
(a)
(b)
22
VCOPKICR
(c)
Power Spectrum
Q=10 Jitter Q=5 Jitter
Eye Diagram
• Data Rate = 19.2Gb/s
• Voltage ripple caused by lack of current source at differential pair tail node
Jitter
2.2ps RMS
15.6ps pk-pk
High Speed Transmitter Comparisons
P. Chiang J. Kim U. Singh D. ShaefferVLSI 2004 ISSCC 2005 VLSI 2005 ISSCC 2003
Data Rate (Gb/s) 20Gb/s 40Gb/s 34Gb/s 40Gb/sPower 165mW 2.7W 1.335W 4.9WArea 0.2275mm^2 9.18mm^2 4.16mm^2 8.25mm^2Jitter (RMS, pk-pk)2.37ps, 15ps 1.53ps, 8.11 ps 1.44ps, 9.44ps 880fs, 5.1psTechnology 0.13um CMOS 0.13um CMOS 0.18um CMOS 0.09um SiGe
A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOSusing a Tri-State Binary PD with 100ps Gated Digital OutputT. Masuda, et. al., ISSCC 2007.
A full-rate 10Gb/s transceiver core employing a tri-state binary PDwith 100ps gated digital output is implemented in a 90nm CMOS process. Direct drive from the VCO is utilized to eliminate the 10GHz clock buffer current. The RX exhibits a recovered jitterof 906fs(rms) and an input sensitivity of 5.9mV. The TX generatesa jitter of 5mUI(rms). The chip consumes 250mW.
Conventional Serial Link Receivers
• Conventional architectures also use multi-phase PLL
Pre-Amp
In Data20Gb/s
MultiphasePLL
D[0]
D[1]
D[2]
D[3]
ck[0] ck[1] ck[2] ck[3]
– Static Phase Offset– Power Supply Sensitivity
2nd Generation Transmitter
• 2-Tap Equalizer implemented for compensatingfor channel losses– Achieve 50ps analog delay with CML buffers
10GHz
Oscillator
20Gb/s
10GHz CLK
10GHz CLKB
PhaseComparator
Off Chip
@ 1.25GHz
Charge
Pump
VaractorControl
Data
Retiming
PRBS/BER
Checker
Divider
2:11.25GHz
2.5GHz
10Gb/s
2:1
2:1
4
5Gb/s
5Gb/s
4 phases @5GHz
5Gb/s
2:1
2:1
2:1MUX
2:1MUX
2:1MUX
50psDelay
10Gb/s
Low-HighBuffers
2:1MUX
10GHz->5GHzDivider
Low-HighBuffers
5Gb/s
4
8
5GHz->2.5GHzDivider
8 phases @5GHz
EqualizingPath
MainPath
EqualizingPath
Fabrication: Test Chip • ST Microelectronics 0.13um test
chip– 307mW / transceiver– 0.46mm^2– 20mV input sensitivity
2006 0.13um Test Chip
500um
600um
350um
450um
Transmitter
Receiver
L-H
L-H
L-H L-H. . .
10GHz
10GHzVCO
8
LowSwing
DigitalSwing
5GHz 2.5GHz
LowSwing
DigitalSwing
In0
In1
Out0
Out1
WP / WN = 4 / 1
In1[a] In1[b]
Out1[a] Out1[b]
Low-HighConverter [L-H]
2.5GHz-> 1.25GHz
To Phase Comparator
In1
FromCharge Pump 10Ghz-5>Ghz
Divider
CML Divider Stage
Normal-Sized
Inverters
Results
20Gb/sIdeal Channel
20Gb/s-6.5dB @ 10GHz
43ps
80mV
33mV
37ps
All Results Single-Ended
Results (cont’d)
62mV
35ps
20Gb/sIdeal Channelwith α=0.37
20Gb/s-6.5dB @ 10GHz
with α=0.37
36.4ps
72mV
Rationale for Multi-cores• Next generation computing – Multi-core Processing
– i.e. multiple, parallel DSPs (i.e. MACs)
• Why we cannot achieve faster frequencies?– Wire delays don’t scale like transistors– Power increases exponentially
(when pushing process technology)– Timing margins degraded by
• Variability• Power supply noise• Digital crosstalk
• NOTE: More independent threads require more memory bandwidth Intel, 80 Cores, ISSCC 2007
Research: Explore Parallel Serial Links
• Serial Links also exhibit the same characteristics
– Channel losses get worse
– Power consumption increases significantly with bandwidth
– Timing precision limited by:
• Static Phase Offset (process variation)
• Power-supply Induced Jitter
• Interchannel Crosstalk
• Serial Links need to to also push for high amounts of parallelism
– How is this different than conventional link design?
• Channel equalization becomes more difficult
– Adjacent channel crosstalk
– Difficult channel estimation problem (power, flexibility, data-rate, equalizer design, channel, distance)
• Amortize Clock Power for Multiple Links
– Distributed resonant clocking of analog/mixed-signal front-end’s
Problem of IO
• 2500 pins / 2 = 1200 Differential pins• Assume 10Gbs / link = 12 Tb/s Bandwidth• 100mW/Gb(bandwidth) = 120W
Stateye Playing
• Fun with Stat-Eye– 5Gb/s -> 10Gb/s– Worse Channels– Worse timing jitter
• Homework examples
Next Time
• Telegrapher’s Equation– Reflection coefficients
• Channel Models– Skin Effect– Dielectric constant– vias