chapter 7 high-speed signal prof. lei he electrical engineering department university of california,...
TRANSCRIPT
Chapter 7
High-Speed Signal
Prof. Lei HeElectrical Engineering Department
University of California, Los Angeles
URL: eda.ee.ucla.eduEmail: [email protected]
High-speed Links are Everywhere
Backbone Router Rack
PC or Console [Sredojevic:ICCAD’08]
High-Speed Links: Applications
Chip-to-chip signaling Computers, games: SDRAM(DDR, DDR2) 100-700MHZ,
RDRAM 800-1600MHz, DDR3 800-1600MHz, DDR4 1.6-3.2GHz, XDR DRAM 3.2-6.4GHz
Board-to-board Computers: Peripherals- PCI (66-133-400MHz), PCIe
(250M-500M-1GHz), Infiniband (2.5Gb/s)
Networks LAN: Fast Ethernet, Gigabit Ethernet, 10G Ethernet WAN: OC-12 (625MHz), OC-192(12.5GHz) Routers: 625Mb/s – 2.5Gb/s
Outline
Link Design Basics
Signal Integrity
High Speed Signaling Architectures
Equalization
Post-Silicon Tuning of High-Speed Signaling
Noise
Signals may be corrupted from many sources Inter-symbol interference (ISI)
– Frequency-dependent attenuation (dispersion)– Reflection– Oscillation
Crosstalk Power supply noise Real noise
– Thermal and shot noise Parameter variation
Noise measure Eye diagram
– Timing jitter– Amplitude noise
Inter-Symbol Interference
A signal interfering with itself
Ideally a transmission system is time invariant No history of previous bits
In reality, the state of the system is affected by previous bits Signals that don’t reach the rails by the end of cycle
– Signal’s transition time is limited by channel bandwidth Reflections on the transmission lines Magnitude and phase of excited resonances
ISI - Dispersion
Frequency-dependent attenuation In general, channel is low pass
Our nice short pulse gets spread out Example: a 101 pattern
ISI - Reflection
Reflections of previous bits travel
up and down transmission lines
A mismatch of δ gives (to the first
order) a reflection of ρ
1
1
0
0
t
t
ZZ
ZZ
ISI - Resonances
Oscillations are excited by signal transitions and may interfere with later transitions
Excitation of resonant circuits is reduced with longer transition times
Slower edge has less high frequency spectral content
Resistance damps oscillation
• Crosstalk is the coupling of energy from one line to another via:
• Mutual capacitance (electric field)• Mutual inductance (magnetic field)
• One signal interfering with another signal
Zs
Zo
Zo
Zo
Mutual Capacitance, Cm Mutual Inductance, Lm
Zs
Zo
Zo
Zo
Cm
Lm
near
far
near
far
Crosstalk
dt
dILV mLm
dt
dVCI mCm
• The mutual inductance will induce current on the victim line opposite of the driving current (Lenz’s Law)
• The mutual capacitance will pass current through the mutual capacitance that flows in both directions on the victim line
Crosstalk Induced NoiseCrosstalk Induced Noise
Zs
Zo
Zo
Zo
Zs
Zo
Zo
Zo
ICmLm
near
far
near
far
ILm
LmCmfarLmCmnear IIIIII
• Near end crosstalk is always positive
• Currents from Lm and Cm always add and flow into the node
• For PCB’s, the far end crosstalk is “usually” negative
• Current due to Lm larger than current due to Cm
• Note that far and crosstalk can be positive
Driven Line
Un-driven Line“victim”
Driver
Zs
Zo
Zo
Zo
Near End
Far End
Voltage Profile of Coupled NoiseVoltage Profile of Coupled Noise
Power Supply Noise
The power supply network has parasitic elements
On-chip: resistive Off-chip: inductive
Current draw across these elements induces a noise voltage:
Instantaneous current is what matters
May be many times the DC current– 10W chip draws 4A at 2.5V– Peak current may be 10-20A
)(tidt
dLRi
Simultaneous Switching Outputs (SSO)
When several outputs switch simultaneously, significant current is drawn from the supply or sent into ground
Supply connections have inductance SSO currents produce a voltage drop
across these inductances
On-chip, the VDD to VSS voltage difference decreases
Effect grows with number of drivers switching
Quadratic with the inverse of transition time
Between chips, the drops across VSS inductances can effect driver timing and shift the receiver threshold
Other Noise Sources
Alpha particles 5MeV particle injects 730fC of charge into substrate One node typically collects less than 50fC
Thermal and shot noise Proportional to bandwidth – typically in the uV
Parameter mismatch VT and β have deviation proportional to 1/sqrt(WL) Systematic variations depend on layout
Eye Diagram
This is a “1”
This is a “0”
Eye – space between 1 and 0
With timing noise
With voltage noise
With both!!
Eye Diagram (cont’d)
Standard measure for signaling Synchronized superposition
of all possible realizations of the signal viewed within a particular interval
Timing jitter Deviation of the zero-
crossing from its ideal occurrence time
Amplitude noise Set by signal-to-noise ratio
(SNR) The amount of noise at the
sampling time
TX RX
channel
Outline
Link Design Basics
Signal Integrity
High Speed Signaling Architectures
Equalization
Post-Silicon Tuning of High-Speed Signaling
Signaling – Main Idea
A good signaling system isolates the signal from noise rather than trying to overpower the noise
Crosstalk– Terminate both ends, use homogeneous media
ISI– Matched terminations, no resonators, rise-time control
Power supply noise– Avoid coupling into signal or reference
Differential signaling Current mode stable reference
Architecture of Signaling
Driver Amp
Mu
ltiple
xer
Tx Clock Rx Clock
De
mu
ltip
lexe
r...1001...
Channel
...1001...
Signaling Architecture Tradeoffs
Signal modulation PAM (Pulse-amplitude modulation) Pulsed (Return-to-Zero, RZ) signaling Binary (ex:NRZ) or Multiple-level signaling (MLS)
Uni-directional or Bidirectional Time-multiplexed bidirectional or simultaneous bidir.
Single-ended or differential Current mode or voltage mode Bus or single-trace Point-to-point or multi-drop
Example System - Trade-offs
Line considerations:length, impedance,
attenuation, discontinuities
Transmitter choices:output impedance,
bipolar/unipolar driver,amplitude, rise time,
single-ended/differential
Reference considerations:VDD-div, Tx, Rx, diff
Receiver choices:offset, sensitivity, BW
Termination choices:source, destination,
both, underterm
Voltage Mode vs Current Mode
Main differences are Ease of control and generation
– Much easier to generate a small current than a small voltage Coupling of supply noise
– 50% of supply noise shows up on the data line in the matched voltage mode; potentially much less in a high-Z current-mode driver
Generation of high-Z switches easier than controlled-Z switches
Single-ended vs Differential
Single-ended signaling compare to shared
reference Often used with a bus Issues
– Generates SSO noise– How to make reference– How to quiet reference– Crosstalk cannot be made
common-mode
• Differential signaling
• compare between two lines
• Noise immunity
• Many noise sources become common mode
• Issues
• Differential must run > 2x as fast as single-ended to make sense
• Otherwise, powerx2, pinsx2
Binary vs Multiple-level (4-PAM)
Binary (NRZ) is 2-PAMUse 2-levels to send one-bit per
symbol
• 4-PAM uses 4-levels to send 2 bits per symbol
• Each level has 2 bit value
When Does 4-PAM Make Sense?
Simultaneous Bidirectional Signaling
Wires can transmit waves in both directions
It seems a shame to only use one direction at a time
Simultaneous Bidirectional Signaling Transmit waves in both directions at the same time Waveform on wire is superposition of forward and
reverse traveling wave Subtract transmitted wave at each end to recover
received wave There are 3-levels on the line but it’s still 2-level
signaling Much more sensitive to reflections and crosstalk
Outline
Link Design Basics
Signal Integrity
High Speed Signaling Architectures
Equalization
Post-Silicon Tuning of High-Speed Signaling
Equalization
Channel is band-limited, most of them are low-pass
Goal is to flatten the overall response
Equalization: Boost higher frequencies relative to lower frequencies
Can be done at Tx or RX or both
channel equalizer
Receiver Linear Equalizer
Amplifies high-frequencies attenuated by the channel
Pre-decision
Digital or Analog FIR filter
Issues Also amplifies noise! Precision Tuning delays (if analog) Setting coefficients (adaptive filter)
– Adaptive algorithms such as LMS
Transmitter Linear Equalizer
Tx Pre-emphasis Filter
Attenuates low-frequencies Need to be careful about output amplitude -
limited output power– If you could make bigger swings, you
would– EQ really attenuates low-frequencies to
match high frequencies Also FIR filter: D/A converter
Can get better precision than RX
Issues How to set EQ weights? Doesn’t help loss at high f
Tx Linear EQ: Single Bit Response
Outline
Link Design Basics
Signal Integrity
High Speed Signaling Architectures
Equalization
Post-Silicon Tuning of High-Speed Signaling
Process Variation vs Analog Circuits
[ITRS]
Threshold voltage variation is increasingly dominant and is primarily
random Due to increasing and random doping fluctuation
Corner-based design is not effective for match used widely in analog
circuits Often results in over-sized circuits and excessive area/power
Post-Silicon Tuning is Effective
Post-silicon tuning is effective to compensate random process variation
Digitally tunable circuit is commonly adopted Insensitivity to noise and variation Suitable for process migration
[Li:ICCAD’08]
Post-Silicon Tuning of High-Speed Signaling
Algorithm Framework Problem formulation Branch and bound based
algorithm
Case Study I: Transmitter
Case Study II: PLL
Conclusions
Unit Cell Based Design Methodology
Pre-characterize different types of unit cell, e.g., transistor with a given threshold voltage
and unit W/L.
A transistor of larger W/L can be synthesized by connecting those unit cells of same
type in parallel
Design variables simply become
– type of unit cell α(threshold)
– number of unit cells in parallel (sizing) Constraints such as output swing is satisfied for correct operation
Apply to other circuit elements such as unit capacitance and resistance
Make design better and modeling more accurate
Digitally Tunable Circuits
one tap in a pre-emphasis filtercurrent source can be implemented by current-division DAC
Current-division DAC is commonly used to combat process variation Two tuning parameters
LSB size ( ): minimum step during digital-to-analog conversion Resolution (β): number of bits used
Impact of Post-Silicon Tuning
(a) Without Tuning (b) With Tuning
Example: BER for a high-speed link 4-tap pre-emphasis filter in a transmitter 0% (3σ) variation in Vt
Design-time optimization and post-silicon tuning circuit both need area, and joint optimization is must
Joint Optimization
parametric yield
power constraint. Process variation changes power
area constraint. Process variation does not change layout areabound on design parameters bound on the total number of unit cells types
bound on the LBS and resolution
e
Optimization Challenges
Discrete problem with non-convex objective and constraints Solution space surface is rough and many local maxima exist Significant improvement can be expected
3000 Monte Carlo runs over different unit cell design α, resolution β, and LSB size for one tap of FIR
Algorithm framework: Partition the solution space by LSB size ( ) and unit cell type (α) Develop a bound on the parametric yield Discard (fathom) if bound worse than the current best solution
Overall Algorithm
Use gradient ascent method to find the local maxima
– Sequentially take steps in the direction proportional to the gradient.
Bound estimation Remove the area and power
constraints Use LMS algorithm to find
optimal yield value
All
αi
αj
αk
γi
γj
γi
γj
γi
γj
Pruned by upper bound check
infeasible
Gradient Ascend Method
In each un-pruned region, sequentially take steps in direction proportional to the gradient, until a local maximum of the objective function is reached.
At each step, increase/decrease each variable by 1 in turn and check the change of the objective function.
Always take the change (direction) that causes the maximum increase.
Termination of the algorithm indicates that one of the local maxima has been reached or that we have reached the boundary.
The initial guess for the GDA can be arbitrarily chosen. In our experiments, we find that it did not influence runtime or quality significantly.
We also observed that the algorithm always converges to local optimum within two or three iterations.
Post-Silicon Tuning of High-Speed Signaling
Algorithm Framework
Case study 1: transmitter
Knobs for design-time and post-silicon
Modeling and formulation
Experimental results
Case Study 2: PLL
Conclusions
Knobs for Optimization
Given transmission channel → filter coefficient → transistor size
change channel behavior ← parasitic capacitance
data out
ChannelPre-driver
Pre-amplifier
Slicer
CDRFIR pre-emphasis filter/driver
IC0 IC1 ICn
N-tap FIR filter
Filter coefficients
Transmitter Receiver
RD
RD
a0 a1 an
Knobs for Optimization
data out
ChannelPre-driver
Pre-amplifier
Slicer
CDRFIR pre-emphasis filter/driver
IC0 IC1 ICn
N-tap FIR filter
Filter coefficients
Transmitter Receiver
RD
RD
a0 a1 an
Problem Formulation
For transmitter
,
random variable
BER Distribution Comparison
20% (3σ) variation in Vth with 10000
Monte Carlo runs
Design 1 - without tuning circuit– All resources are used for filter
– Unavoidable large variation
Design 2 - one tap filter– All resources are used for DAC– Has extreme small variance but
suffers severe ISI
Design 3 – heuristic design– Assume 4-tap filter– Assume LSB size is equal for each
tap
– Limit the solution space– Good improvement compared to two
extreme cases
Design 4 - our algorithm– Provides better solution (mean,
variance)
Yield Rate
Experiment setting Channel – 30cm differential microstrip line with FR-4
substrate 5GHz data rate Yield is set by BER=1e-15 (estimated by EVM)
Yield comparison for different area constraints
area
Our algorithm always provide better yield than design heuristic
With aggressive area constraint, our algorithm has much less yield degradation
Saturation effect Up to 47% improvement
Yield with Power Constraint
vt variationpower
Post-Silicon Tuning of High-Speed Signaling
Algorithm Framework
Case study 1: Transmitter
Case study 2: PLL Design
Conclusions
PLL output clock jitter
Hnin and HnVCO are the noise transfer function of reference clock noise and
VCO noise E.g.
Jitter Modeling
[Mansuri:JSSC’02]
PFDKPFD
CP1
VCOKVCO/S
CP2
OPamp
CLKref
CLKVCO
ФnIN
ФOUT
VCtrl
ФnVCO
VINT
CCP
VCtrl
1/gmOP
ω0
UP /DN
VINT
ICP η×ICP
Tunable PLL Jitter can be changed by
tuning the charge pump current ratio
Joint Optimization
Design-time optimization Two charge pumps Icp1, Icp2
Ratio (Icp1/ Icp2) determines output
RMS jitter Optimal ratio can be found using
design-time optimization
Again, process variation would cause
performance degradation
Digitally tuned current mirror Small reference current
– Consumes less power
– η need to be far less than
unity
– Limited tuning resolution Large reference current
– Good tunability
– Power and area penalty [Horowitz:JSSC’00]
Same Formulation Applies
For PLL objective function becomes
and area can be computed in a way similar to the transmitter case.
Experimental Results
PLL with digitally controlled charge pump current Yield is defined by output clock RMS jitter Design heuristic using minimized biasing current Consider 30% Vth variation
Improve the yield by up to 56%
Conclusions
Formulate a joint optimization problem for digitally tuned analog circuits
Consider both design-time optimization and post-silicon tuning
Maximize performance yield s.t. power and area constraints
Propose a general optimization framework Combine branch-and-bound and gradient-ascent algorithm Effectively find the global optimum
Two joint optimization design examples for high-speed serial link
Transmitter design PLL design
Experiments show great (>47%) yield improvement over common circuit design heuristic