clock and data recovery pll design considerations in 0.1 um … · 2015-07-29 · clock and data...

Clock and Data Recovery PLL Design Considerations in 0.1 um CMOS

Steve Scroggs ([email protected])Zachary Pfeffer ([email protected])

Department of Electrical and Computer EngineeringUCB 425, University of Colorado, Boulder CO 80309.

Telephone/fax: 303-635-1962

Abstract

The purpose of this paper is to detail our investigation of issues involved in clock and data recoveryassociated with a high speed serializer-deserializer (SERDES) receiver block. Because of the over-allcomplexity of such a block, the approach that was taken was to start at a high level of abstraction, gain ageneral understanding at that level, then choose one of the more interesting sub-blocks and investigate thisblock at the next level of detail.

1. Introduction

Future high bandwidth interfaces are adopting serial physical connections, as opposed to parallel busses.Technology that was once used in communication over long distances, where the interconnect media wasthe dominant cost, is now finding its way into local interfaces. Examples of this trend are interfaces such asSerial Attached SCSI, Serial ATA, HyperTransport, and future generations of PCI.

The motivations for changing from a parallel to a serial approach stem from the following design problemsthat are inherent in parallel busses, and mitigated in serial configurations.

The bandwidth requirements have reached a point at which reliable parallel bus architectures are becomingimpossible to implement. Specifically, matching independent data and clock signals at high clock rates andover wide busses isn’t feasible. The timing margin available at these clock rates is very small. Tracelengths on the circuit board as well as wire lengths in cables and connectors must be controlled much tootightly.

By contrast, a serial connection does not have to match the delay of multiple signal lines. There is only onesignal. The data and clock are encoded and transmitted together over this connection.

Because parallel busses are generally wide (many data signals), they have had to allow configurations otherthan point-to-point connections. To do otherwise would require very large pin counts on integrated circuitsand enormous amounts of routing between chips and boards, to connect more than two devices. This hasresulted in a multi-drop connection (i.e. where more than two connections can be made to each bus signal,which by definition, means that connections must be made at points other than at the ends of signal wires).Acceptable signal integrity is very difficult to achieve under these circumstances. The high speed signals,with transmission-line characteristics, must be properly terminated to avoid reflections and reduce thesettling time to acceptable durations. Termination of a high speed, multi-drop bus can be a very difficult orimpossible problem to solve.

Another result of multiple devices connecting to a single multi-drop parallel bus is that, to communicate inboth directions, the bus signals must be bi-directional. That is, the bus can be driven from more than oneconnection and all other connections must be able to receive. This just exacerbates the signal integrityproblem. It is more difficult to properly terminate a signal when multiple drivers exist.

A serial bus can eliminate the multi-drop configuration and requirement for the bus to be bi-directional.Because the number of board traces and chip pins that are required to implement a serial bus is so small,

point-to-point busses can be used to connect all devices. Another result of greatly lowering the number ofsignals is that it becomes feasible to implement multiple data paths per bus. This allows a bus interface tobe composed of two unidirectional channels. Termination of a point-to-point unidirectional connection is asimple problem to solve and the solution can be integrated onto the chips. Signal integrity is improveddramatically, and as a result, clock speeds can be greatly increased.

Yet a third advantage can result from lowering the number of data signals in a bus. Now, two signal wirescan economically be used to carry the serial data/clock signal. This allows a serial bus to use differentialsignaling. Differential signaling is much more noise tolerant than is single-ended signaling. Small signalamplitudes, that are compatible with modern integrated circuits, provide a satisfactory signal level in thismode. Differential signaling is used in some parallel interfaces, but this doubles the already large numberof signals in these interfaces, making several of the problems noted above even worse.

The resulting flow of our investigations and subsequently the flow of our paper is as follows: Section 2 isdevoted to clock and data recovery issues. Section 3 investigates PLL based clock recovery. In Sub-Section3.1 the PLL Transfer Function is presented. In Sub-Section 3.2 the LC tank VCO is presented. In Sub-Section 3.2.1 the inductor on silicon for the LC tank VCO is presented. In Sub-Section 3.2.2 the LC tank isinvestigated. In Sub-Section 3.2.3 the Gate-channel depletion region based varactor for the LC tank VCO ispresented. In Sub-Section 3.2.4 the entire LC tank VCO based PLL is shown with simulations. In Section3.3 the non-periodic reference phase-frequency detector for the LC tank VCO is presented. In Sub-Section3.4 Data Recovery issues are explored. Conclusions are discussed in Section 4.

2. Clock and Data Recovery Issues

Clock recovery is the process of deriving a local, highly accurate clock from a data stream. The recoveredclock must have both a matched frequency as well as a constant and known phase relationship to the data.This can be achieved through use of a phase-locked loop (PLL) circuit. The characteristics of the datastream that the PLL must lock to is also of great importance.

2.1 Data Stream

An 8b/10b encoded serial, data stream has been chosen as the data sequence for this SERDES. This codingscheme is used in standards such as Gigabit Ethernet. 8b/10b encoding has the following advantageouscharacteristics: DC balance and run length limits.

DC balance means that, on average, a binary data stream will spend the same amount of time in the logic 1state and in the logic 0 state. The 8b/10b designation means an 8 bit char is encoded with 10 bits. Sinceonly a 4th of the encoding space is used only those codes that satisfy the DC balance characteristic can beused and the rest are thrown out. Essentially, an 8b/10b encoded stream has a disparity of either +1 or -1 forany whole character. This can be shown effectively with the use of a Trellis graph. The 2 Trellis graphsdepicted in Figure 1 show the first two characters that are seen in a Gigabit Ethernet stream. In any 8b/10encoder/decoder an initial disparity must be assumed. In Figure 1, the initial disparity is +1. Thesuperimposed black line shows the disparity progression as the character is recieved.

(+) /K28.5/D16.2/

3 210-1-2-3

f g h j a b c d e I f g h j a b c d e I

/K28.5/ 1 1 0 0 0 0 0 1 0 1

3210-1-2-3

f g h j a b c d e I f g h j a b c d e I

/D16.2/ 0 1 1 0 1 1 0 1 0 1

One Character

Transmit Bit Position

One Character

Transmit Bit Position

Figure 1: 8b/10b Trellis Graphs

The second characteristic of 8b/10b mentioned above is critical to clock recovery. For correct clockrecovery to occur, transitions in the data stream must happen often enough. This code has a maximum runlength of 5 bits between transitions.

3. PLL Based Clock Recovery

PLL’s can be built using several architectures. We have chosen to investigate an LC tank voltagecontrolled oscillator (VCO) based PLL. The basic functions that make up this style of PLL are shown inFigure 2.

Figure 2: PLL Block Diagram

The PLL works in the following way. The phase detector compares an incoming clock (in our case atransition in the 8b/10b data stream) to a locally generated clock from the VCO. If the two clocks aretransitioning together, they are considered ‘locked’ to each other and the phase detector does nothing. Ifhowever, their transitions are not simultaneous, the phase detector outputs an error signal. This error signalis used to control the charge pump. Depending on whether the input clock leads or lags the locallygenerated clock, the error signal, via the charge pump, will source current to or sink current from acapacitor. The voltage level of the capacitor is then the input to the VCO. With the VCO connected to thephase detector as the locally generated clock, a feedback system is established. This feedback continuallyadjusts the local clock to keep it locked with the input clock.

3.1 PLL Transfer Function

The overall transfer function of the PLL is of great importance. PLLs can be categorized into three typesbased on their characteristics [7]. A type 2 PLL is what is needed for this application. This type matchesthe frequency of the reference clock and also maintains a zero phase difference between the generated clockand the reference clock (type 1 matches frequency, but the phase relationship is not known).

The appropriate transfer function in the complex frequency domain is:

T(s) = Kp * Kv * Kn * [s + a] / s^2

where:

Kp – phase detector gain in V/ radKv – VCO gain in rad/s/VKn – the amount the feedback clock is divided bya – is a loop filter zero that adjusts the settling time and overshoot

This function is basically a low pass filter followed by an integrator.

A model of a PLL was constructed in the Cadence analog tool suite. Each of the blocks were modeledusing VerilogA system components. The intention was demonstrate a PLL coming up and achieving lockwith near zero phase error. Unfortunately, due to time constraints we were not able to complete this phaseof our investigation. In our simulations, the PLL would come up and approach lock, but oscillate near itsfinal frequency. Our transfer function is not quite there.

3.2 VCO

The VCO block in the PLL outputs a waveform with a frequency proportional to its controlling voltage.Our LC tank VCO is distinguished because it uses an inductor on silicon to enable oscillation. In order todevelop an LC tank VCO it is instructive to begin with the inductor and work up to the VCO. This is incontrast to the flow of the paper up to this point, but provides the reader a path to very important designconsiderations when using an LC style VCO. The inductor will be introduced first, followed by an analysisof the LC tank. The varactor will then be presented and the section will end with a discussion of the resultsof the simulated VCO. After this section the paper will resume its top down presentation.

3.2.1 Inductor on Silicon

Shrinking geometries and faster clocks have made on-chip inductors realizable. Typically, designers havetried to combat inductance. For instance, mutually inductive noise at an IO pad may couple with anintended signal and introduce ringing in a phenomena known as ground bounce. On-chip inductors pose achallenge due to their size and proximity towards other active and linear devices. In addition, on-chip

inductor parasitic effects occur at the macroscopic chip scale, easily dominating wanted effects by otherelements.

A two-tiered approach was used to build the inductor for the LC tank VCO. First, an analytical model wasobtained that gave reasonable inductance and lumped parameter estimates for a 4 sided, single layerinductor [1]. As an example, a 10um wide, 4 sided inductor on metal 1, utilizing 2.75 turns with an internaldiameter of 4um was calculated to exhibit 85.88 pH of inductance.

L = (11.25 * uo * N^2 * AL^2) / (11 * OL - 7 * AL)L = Inductance in Huo = Permeability in free-space = 4*pi E-7 H/m NNumber of Turns AL = (OL + IL) / 2OL = Outer DiameterIL = Inner Diameter

After a few iterations, the ASITIC tool was adopted and found to be a superior way of designing ICinductors[2]. Its only requirement is a technology file specifying such things as the layer thickness andsheet resistance. The diagram in Figure 3 demonstrates the parameters used in the technology file. This filewas built for this design to compliment the other technology files, most specifically the BSIM3v3 0.1umPredictive NMOS and PMOS model cards [3].

Figure 3: Process Parameters

The technology file is named CMOS_nexsis10.tek.

For the previous inductor constraints, the inductor shown in Figure 4 is produced in ASITIC with this oneline command.

SQ NAME=I1:LEN=10:W=1:S=1:N=2.75:METAL=M2:EXIT=M1:XORG=64:YORG=64

Figure 4: Spiral Inductor

ASITIC has a myriad of analysis techniques centered around a Maxwell's equations solver that produces api model (Figure 5) suitable for SPICE simulation.

[2]Inductance in nano-Henries (nH)Capacitance in pico-Farads (pF)

Resistance in ohmsDimensions in microns (micro-meters).

Frequency in giga-Hertz (GHz).

Figure 5: Inductor PI Model

For the inductor of Figure 4, the Pi model, at f=10.00 GHz results in:

Q = 0.409,0.409,0.409L =0.0392 nH R = 6.01Cs1= 1.47 fF Rs1= 483Cs2= 1.32 fF Rs2= -595 f_res = 662.95GHz

The 3 Q values represent the Q when the outside port is grounded, the Q when the internal port is groundedand the differential Q at both ports. In this particular configuration, they are all the same.

A very nice feature of ASITIC is its ability to easily guide the user to their target inductor. In order to get ahandle on the approximate inductor geometries needed, ASITIC's sweep feature was employed in designingthe inductor for our VCO and the results can be charted. Four such charts are provided in Figures 6-9.

Q1 vs Inductance

0

5

10

0 0.5 1

L

Q1 Q1

Figure 6: "Q Cloud diagram" for a single layer inductor

Inductance vs Number of Turns

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3

Turns

Ind

uct

ance

L

Figure 7: Inductance per turns for a m1 m2 multi-layer inductor

Inductance vs R

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80

R

Ind

uct

ance

R

Figure 8: Inductance vs. R for a m1 m2 multi-layer inductor

Inductance vs. Capacitance

0

2

4

6

8

10

12

14

0.4 0.5 0.6 0.7 0.8 0.9 1

C1

C2

Figure 9: Inductance vs. Capacitance for a m1 m2 multi-layer inductor

From the graphs, the first of many conflicting design realities become clear. The VCO requires an inductorwith a very high Q and a high L. High Q is shown to occur for smaller geometries and high L for moreturns which introduces large parasitic effects. These parasitic effects limit the tune-ability of the VCO. Thecapacitance of the inductor and the transistor add in the small signal model causing a bound on the varactorwhich also adds with the transistor and inductor capacitance. For this reason it is more effective to workwith an inductor with a high Q that resonates at a higher frequency than the target.

For the LC tank VCO a two layer inductor (Figure 10) was used. Its parameters are presented here.

Outer dimension(s) of spiral (edge to edge) = 33Metal width = 1Spacing (metal edge to metal edge) = 2Turns = 2Top Metal layer = m2Bottom Metal layer = m1

Figure 10: Two Layer Inductor

Pi Model at f=40.00 GHz: Q = 2.714,0.600,3.290L = 0.976 nH R = 53.3Cs1= 6.2 fF Rs1= 6.25Cs2= 13.3 fF Rs2= 7.1 f_res = 64.14GHz

3.2.2 The LC tank

The equivalent small signal model of the LC tank is shown in Figure 11. Notice that only one leg of theASITIC pie model is present in the SSM and only one stage of the inductor. Due to the concept of virtualground the circuit may be broken up and each stage may be analyzed individually.

Figure 11: LC Tank Circuit

A theoretical MATLAB simulation yields the Bode graph as Cv is adjusted (Figure 12).

Figure 12: LC Circuit Predicted AC Response

The graph of Figure 12 was generated for the chosen inductor. Cv is adjusted and the resonate frequencychanges as well as the gain as expected. Cv was swept from 0 fF to 32 fF. As the capacitor is increased thegain goes down because it is in parallel with the other elements. The x axis must be converted from 20log().

Analog circuit simulation of the chosen inductor gave very similar results (Figure 13). The capacitor wasswept from 0 fF to 25 fF The marker is positioned at the target oscillation frequency of 40 GHz . Thesimulation confirms the main design challenge of the LC tank VCO. The designer must minimize parasiticcapacitive effects of the inductors and the gain stages to maximize the range of the varactor. As seen in theMATLAB simulation and the circuit simulation, this range must be such that the circuit still exhibits areasonable gain.

Figure 13: LC Circuit Simulated AC Response

Figure 14: Simulated LC Tank

3.2.3 Controlling the LC Tank VCO

A varactor (variable capacitor) is used as part of the capacitance seen by the LC tank circuit. The voltageinput to the VCO that controls the frequency produced does so by adjusting this capacitance.

A gate-channel depletion region based varactor was used in the LC tank VCO. The two plates of thevaractor are created by the gate and bulk of a P-channel MOSFET. As the DC voltage across this capacitoris increased, the depletion region under the gate grows. This further separates the charge in the capacitor,decreasing the capacitance seen by the small signal oscillation applied to the varactor.

For the varactor to be useful, we must be able to connect one terminal to the LC tank and the other to thelow frequency voltage control. For this to be possible in a standard process, a P-channel MOSFET must beused, because it is constructed with a floating bulk (n-well). If an N-channel MOSFET were used, the platecreated by the bulk would be fixed at the substrate potential.

3.2.4 The LC Tank VCO Based PLL

With a suitable inductor and a fully characterized LC tank circuit, an LC tank VCO may be constructed(Figure 15). An LC tank VCO can be constructed in the following way. Two inverting transconductanceamplifiers are connected in a loop. A resistive load on these amplifiers would result in a constant voltagegain at the output of the transconductance amplifier, minus effects from parasitic poles and zeros.However, the loads of these amplifiers are composed of the LC tank circuit, instead of resistive loads.

The impedance created by the LC tank is high at a tuned frequency and low elsewhere. Therefore, at thetuned frequency a large enough gain is produced to cause sustained positive feedback around the invertingamplifier loop.

The quality factor or Q of the second order transfer function created by the LC tank circuit determines howlarge the impedance at the load of the transconductance amplifiers is, and hence, the gain at the frequencyof oscillation. With a phase shift of 360 degrees around this loop and sufficient gain, the requirements foroscillation are met [5].

Figure 15: Realized LC Tank VCO

R = 53.3 L = 976pH R1 = 6.25 C1 = 6.2f R2 = 7.1 C2 = 13.3f W1 = 1.5uL1 = 100n I1 = 400u Vdd = 3.3V W2 = 100n L2 = 100n

The entire pie model has been included in the VCO although the careful observer will note that the two legsin parallel with the current source and voltage source are effectively shorted in the small signal model. Anarchitecture based on NMOS transistors was chosen because the ratio of gain to area is approximately twicethat of PMOS devices. The smaller area is key in reducing the parasitic capacitance that is contributed bythe FETs and this capacitance a dominant parameter.

An interesting second order effect was seen when the entire VCO was built that was not predicted by theLC tank circuit. The simulation of the tank circuit indicated that the target of 40 GHz was attainable. The

minimum attainable frequency for the same inductor in the VCO was around 45 GHz. Better modeling andinductor iteration should enable perfect matching.

3.2.4.1 VCO Results

The VCO simulations indicate that there is room for improvement. An ideal voltage-controlled oscillator isa circuit whose output frequency is a linear function of its control voltage [5].

wout = wo + KVCO*Vcont.

VCO

43.5

4444.5

4545.5

4646.5

47

47.5

0 0.5 1 1.5 2 2.5 3 3.5

Vcont

Wou

t

GHz

Figure 16: Frequency vs. Control Voltage

The graph of Figure 16 indicates that the VCO may only operate over mV ranges. To show this the controlvoltage was limited to a 0.5 V swing, the resulting trend line of Figure 17 indicates the gain is positive.These graphs are indicative of design problems in the VCO implementation. Through closer analysis andmore precise experimentation, these design issues can be overcome.

VCO

y = 0.047x + 46.644

46.646.6546.7

46.7546.8

46.85

1 1.1 1.2 1.3 1.4 1.5

Vcont

wo

Series1

Linear (Series1)

Figure 17: Gain Trend

The plot shown in Figure 17 was constructed from the output waveforms pictured in Figure 18. Althoughseemingly a graph of various sine waves that have been phase shifted the waves are actually at differentfrequencies. The waves are the result of sweeping the control voltage on the varactor from 1V to 1.5V. Theoutput is taken as the difference of the two outputs of the VCO. It is assumed that a differential stagefollows the VCO.

Figure 18: The Differential Waveform at the Output of the VCO.

3.3 Non-periodic Reference Phase-Frequency Detector

The 8b/10b data stream that the PLL is to lock to is not a periodic signal. As mentioned above it has a runlength limit of 5 bits between transitions. So, the phase-frequency detector that we have used to this pointwould not be adequate. An error signal would be produced even when the local clock is in precisely thecorrect position, but no transition occurs in the data. A different approach is required.

One such approach is called a delay line phase detector. This detector takes the data stream through atapped delay line (Figure 19).

[6]Figure 19: Tapped Delay-Line Phase Detector Input Logic

The output of each tap is captured in a flip-flop that is clocked by the locally generated, phase locked clock.The length of the delay line is designed to be greater than one data bit period, but less than two periods.

The general idea is that a transition in the data stream should be centered in the delay line. That is, if thelocally generated clock is locked in the correct phase and the data does transition in this bit period, thetransition will be captured between the center two delay taps. If the transition is elsewhere in the delay linean appropriate error signal is generated to correct the VCO. If a transition does not occur anywhere in thedelay line for this bit period, an error signal is not generated.

3.4 Data Recovery

With the delay line phase detection scheme outlined above, data recovery is relatively simple. The phasedetector lines the locally generated clock’s low-to-high transitions up with the data transitions. Therefore,the data can be recovered at the center tap of the delay line by a flip-flop clocked by the opposite (high-to-low) transition of the locally generated clock (Figure 20).

Data

Clock

Data is locked to the positive edge of the clock.

Data is sampled on the negative edge of the clock.

Figure 20: Locally Generated Clock Relationship to Data Stream

One of the difficulties with this approach is creating a local clock with high symmetry (or 50% duty cycle).This is needed because the opposite edge is used to capture the data and for best results this clock edgeshould be in the center of the bit period. Several problems exist in realizing a symmetrical clock.

The PLL itself is not guaranteed to generate a symmetrical clock to begin with. An approach to remove theasymmetry of the clock generated by the PLL is to run the PLL at twice the desired frequency and thendivide its output by two. The dividing flip-flop is clocked on the upward edge of the PLL clock only, thedownwards edge is not used. The output the dividing flip-flop then has is nearly a 50% duty cycle.Unfortunately, at the very high speeds of today’s SERDES (in the 10's of Gigahertz), running the PLL attwice this rate presents many problems as well.

4.0 Jitter

It would be remiss not to mention jitter in this paper, simply because jitter is a major concern in high speedSERDES circuits. Jitter is defined as the variation of signal events from their ideal, over time. Thetransmitter must generate a low jitter data stream and the receiver needs to have relatively high jittertolerance when recovering the clock and data.

Jitter is generated by many different noise events and cannot be eliminated completely. It also takesdifferent forms such as cycle-to-cycle jitter and N-cycle jitter, which occupy different regions of thespectrum. Needless to say, in a 10GHz signal, with a period of 100 pico-seconds, very small amounts ofjitter can be tolerated.

Had we gotten to the point in our experimental designs where we were simulating a more real-worldenvironment, controlling jitter would have been an educational experience in itself.

5.0 Conclusion

Many of the design issues relating to a SERDES clock and data recovery block have been investigated inthis paper. We have examined several aspects of this interesting device, while focusing on a few specificcomponents. Many intriguing issues remain to be studied to gain true expertise in the area.

References

[1] On-chip Inductors and Transformers Shwetabh VermaJose M. Cruz Sun Microsystems Laboratories 901 SanAntonio Road Palo Alto, CA 94303

[2] ASITIC: Analysis of Si Inductors and Transformersfor ICs Ali M. Niknejad 572 Cory Hall #1770University of California at Berkeley Berkeley, CA94720-1770 [email protected]://formosa.eecs.berkeley.edu/~niknejad/asitic.html

[3] 0.1um Predictive CMOS Technology BSIM3v3 Model(Beta Version)http://www-device.eecs.berkeley.edu/~bsim3/

[4] The Affirma Analog Circuit Design Environment, provided by Cadence

[5] p 510 Design of Analog CMOS Integrated Circuits Behzad Razavi

[6] Application Note: Virtex-II FamilyXAPP250 (v1.1) April 25, 2002Clock and Data Recovery With CodedData StreamsAuthor: Leonard Dieguezhttp://www.xilinx.com/xapp/xapp250.pdf

[7] Application NotePrepared byGarth NashApplications Engineeringhttp://mail.ece.ucsb.edu/~long/ece145b/an535rev0a.pdf

clock and data recovery pll design considerations in 0.1 um … · 2015-07-29 · clock and data...

Documents