circuits for the design of a serial communication system
TRANSCRIPT
Circuits for the Design of a SerialCommunication System
Utili zing SiGe HBT Technology
by
Thomas W. Krawczyk Jr.
A THESIS SUBMITTED TO THE EXAMINING
COMMITTEE OF RENSSELAER POLYTECHNIC INSTITUTE
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
MAJOR SUBJECT: ELECTRICAL ENGINEERING
John F. McDonald, Chair Gary Saulnier, Prof. ECSE
Kenneth A. Connor, Prof. ECSE Lester Rubenfeld, Prof. Math
Donald Mil lard, Prof. ECSE
Rensselaer Polytechnic Institute
Troy, New York
November 2000
ii
© Copyright 2000
by
Thomas W. Krawczyk Jr.
All Rights Reserved
iii
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1. Introduction & Historical Review . . . . . . . . . . . . . . . . . 11.1. Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. The three chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. Project time line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4. State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5. Contribution to the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1. Feed Forward Interpolated VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.2. Transmitter Interleaving Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.3. Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5.4. Receiver PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6. SiGe 5 HP Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.7. Testing Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.8. Document Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2. Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . 152.1. Serial Communication Block Diagram . . . . . . . . . . . . . . . . . . . . . . 152.2. Transmitter / Multiplexer / Clock Multiplier . . . . . . . . . . . . . . . . . . 162.3. Transport Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4. Receiver / Demultiplexer / Clock & Data Recovery . . . . . . . . . . . . 182.5. Internal Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6. Support Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
iv
3. Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . 213.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2. The need for a VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3. Simple Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1. Adjustable Voltage Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.2. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.3. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.4.4. Optimization of Simple CS VCO (post-fabrication). . . . . . . . . . . . . . . . . . . 27
3.5. Current Starving with Feed Forwarding . . . . . . . . . . . . . . . . . . . . . 293.5.1. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5.2. Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4. Feed Forward Interpolated VCO . . . . . . . . . . . . . . . . . 354.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2. The Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4. Stage Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5. Circuit Implementation and Analysis . . . . . . . . . . . . . . . . . . . . . . . 44
4.5.1. Cascode amplif iers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.5.2. Emitter Resistor for linearity and gain adjustment . . . . . . . . . . . . . . . . . . . . 454.5.3. Center capacitor to control frequency range center . . . . . . . . . . . . . . . . . . . 464.5.4. Bypass resistor to prevent stage decoupling . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6. System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6.1. Branch current to frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6.2. Center frequency and intrinsic stage delay . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.3. Frequency gain at the center frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.4. Frequency Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7. Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.7.1. The Impulse Sensitivity Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.7.2. Solving for phase noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.7.3. Phase noise comparison between the FFI and CS VCOs . . . . . . . . . . . . . . . 57
4.8. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9. Interconnect Parasitic Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 604.10. HDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.11. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.11.1. Circuit Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.11.2. Layout Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.12. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
v
4.12.1. Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.12.2. Common Mode Gain (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.12.3. Response versus supply voltage (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . 684.12.4. Phase noise measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.12.5. Jitter measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5. Design of the Transmitter . . . . . . . . . . . . . . . . . . . . . . 725.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2. Top Level Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 725.3. 16-1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.1. The Case for the Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.3.2. Final Implementation and Simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4. Phased Locked Loop (Frequency Synthesizer) . . . . . . . . . . . . . . . . 825.4.1. Input Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.4.2. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4.2.1. Phase detector (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.4.2.2. Phase detector (Serdes II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4.2.3. Phase detector (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.3. The VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4.4. Loop Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.4.1. Serdes I Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.4.2. Serdes II Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4.4.3. Serdes III Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.5. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.4.6. Lock Acquisition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.6.1. Serdes I Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.4.6.2. Serdes II Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.4.6.3. Serdes III Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.7. 20 / 40 Gb/s Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.5. Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.6. Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.7. Line Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.8. Internal Testing Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.8.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.9.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.9.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.10. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.10.1. Serdes I (transmitter test results). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.10.2. Serdes II (transmitter test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.11. Future Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
vi
5.11.1. 8B/10B Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.11.2. Transmitter data retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.11.3. LC Oscil lator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6. Design of the Receiver . . . . . . . . . . . . . . . . . . . . . . . . 1216.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.2. Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3. Receiver PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.1. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3.1.1. Transition Detector (PD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3.1.2. NRZ Phase / Frequency Detector (PD/FD) . . . . . . . . . . . . . . . . . . . . . 129
6.3.2. The Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.3.2.1. FET Charge Pump / Proportional Control (Serdes I) . . . . . . . . . . . . . . 1316.3.2.2. Negative Impedance Charge Pump (Serdes II) . . . . . . . . . . . . . . . . . . . 1336.3.2.3. Mixed Loop (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.3. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.3.3.1. Serdes I (FET charge pump) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.3.3.2. Serdes II (negative impedance charge pump) . . . . . . . . . . . . . . . . . . . . 1366.3.3.3. Serdes III (dual-loop / referenced loop) . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4. 4-16 Demultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.5. Registers and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.6. Line Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.7. Test Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.7.1. On-chip test pattern generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.7.2. True error rate detector (TERD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.8. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.8.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.9. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.9.1. Serdes I (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.9.2. Serdes II (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.1. Sampling offset correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.2. 40 Gb/s?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.3. Demultiplexer improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Discussion & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
vii
A. IBM SiGe 5 HP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.1. NPN Vbe characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.2. NPN Ic versus Vce characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 158A.3. NPN fT Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
B. CML Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.1. CML Voltage Swing (non-linearized, digital) . . . . . . . . . . . . . . . 160B.2. CML Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.3. Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161B.4. Buffer with emitter follower outputs . . . . . . . . . . . . . . . . . . . . . . . 162
C. CML Circuit Details . . . . . . . . . . . . . . . . . . . . . . . . . 164C.1. Linearizing the differential ampli fier . . . . . . . . . . . . . . . . . . . . . . 164C.2. Current bypassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166C.3. CML delay increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
D. Transistor Sizing to Minimize VCO Delay . . . . . . . 172
E. SpectreHDL models . . . . . . . . . . . . . . . . . . . . . . . . . 178E.1. FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178E.2. 3-State PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179E.3. Transition Detector PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180E.4. Histogram generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181E.5. Jittered data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
F. Toplevel Chip Schematics . . . . . . . . . . . . . . . . . . . . . 184F.1. Serdes I Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184F.2. Serdes I Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185F.3. Serdes II Tranciever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
viii
List of Figures
Figure 1-1. Past and proposed future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Figure 2-1. Toplevel System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 3-1. Four stage VCO diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 3-2. Current Starving VCO frequency and gain response . . . . . . . . . . . . . . . . . 23Figure 3-3. Adjustable Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 3-4. Layout of Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3-5. Test data from Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 3-6. Frequency Response versus emitter length in delay elements . . . . . . . . . . 29Figure 3-7. Feed-forward CS VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 3-8. Feed forward CS VCO frequency response and gain . . . . . . . . . . . . . . . . 31Figure 3-9. Feed-forward CS Delay Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 3-10. Testing Data from feed-forward CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 4-1. Schematic for Delay Interpolated VCO element . . . . . . . . . . . . . . . . . . . . 36Figure 4-2. Feed Forward VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 4-3. FFI VCO under boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 4-4. Feed-forward interpolated simulated response . . . . . . . . . . . . . . . . . . . . . 38Figure 4-5. Delay versus weighting factor with single stage imbalance . . . . . . . . . . . 42Figure 4-6. Decoupling versus delay injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 4-7. Schematic for FFI VCO element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4-8. FFI VCO frequency versus emitter resistance . . . . . . . . . . . . . . . . . . . . . . 46Figure 4-9. FFI VCO frequency versus centering capacitor . . . . . . . . . . . . . . . . . . . . . 47Figure 4-10. FFI VCO frequency versus bypass resistance . . . . . . . . . . . . . . . . . . . . . . 48Figure 4-11. FFI VCO Frequency Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Figure 4-12. FFI VCO System from control voltage to frequency . . . . . . . . . . . . . . . . . 49Figure 4-13. Simulated versus analytical response of the FFI Architecture . . . . . . . . . . 50Figure 4-14. Center frequency simulation and model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4-15. Current pulse effect on phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4-16. Simulated ISF for FFI VCO and output waveform . . . . . . . . . . . . . . . . . . 55Figure 4-17. ISF rms values for various ring oscill ators . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure 4-18. FFI with capacitive interconnect parasitics . . . . . . . . . . . . . . . . . . . . . . . . 61Figure 4-19. FFI Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 4-20. Reducing substrate coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 4-21. FFI waveform at 5 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4-22. FFI VCO measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 4-23. FFI common mode response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 4-24. FFI response versus supply voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Figure 4-25. Open loop phase noise of FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Figure 4-26. FFI VCO analytical and measured jit ter . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 5-1. Transmitter and multiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 73
ix
Figure 5-2. Data timing for the 4-1 multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 5-3. CML Two Level Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 5-4. Simulation Testing of CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . 77Figure 5-5. Simulation Results for CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . 78Figure 5-6. CML Single Level Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . 78Figure 5-7. Symmetric multiplexer transistor states . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Figure 5-8. Multiplexer Eye Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5-9. Multiplexer Layout for Serdes I and II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5-10. Linear model of PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Figure 5-11. Frequency synthesizer evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Figure 5-12. Schematic for input filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figure 5-13. Input filter frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Figure 5-14. Phase detector schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 5-15. Simulated phase detector responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure 5-16. PLL frequency detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Figure 5-17. Passive Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 5-18. Tx PLL passive loop fil ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 5-19. Tx PLL active loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 5-20. Active loop filter transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Figure 5-21. Receiver III integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Figure 5-22. Voltage spectral density for optimal loop bandwidth . . . . . . . . . . . . . . . . 96Figure 5-23. PLL simulated step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Figure 5-24. PLL I simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Figure 5-25. PLL II simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Figure 5-26. 5/10 GHz PLL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Figure 5-27. Clocking scheme for transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Figure 5-28. Transmitter clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 5-29. Load counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 5-30. Serdes I LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Figure 5-31. True error rate detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Figure 5-32. Serdes II bit pattern generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Figure 5-33. Serdes I transmitter layout and photograph . . . . . . . . . . . . . . . . . . . . . . . 111Figure 5-34. Serdes II chip layout and microphotograph . . . . . . . . . . . . . . . . . . . . . . . 113Figure 5-35. Transmitter waveform (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Figure 5-36. Serdes 2 transmitter eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Figure 5-37. Tx PLL measured phase noise spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Figure 5-38. Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Figure 6-1. Top level receiver architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Figure 6-2. Receiver PLL evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Figure 6-3. Receiver topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Figure 6-4. Transition detector in prototype I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Figure 6-5. Transition detector in prototype II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Figure 6-6. Gain of transition detector with data jitter . . . . . . . . . . . . . . . . . . . . . . . . 128Figure 6-7. Phase detector for NRZ data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Figure 6-8. Receiver loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Figure 6-9. MOSFET charge pump integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
x
Figure 6-10. Proportional control and summing junction . . . . . . . . . . . . . . . . . . . . . . . 132Figure 6-11. Serdes I loop locking in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Figure 6-12. Frequency and phase lock-in of serdes III Rx PLL . . . . . . . . . . . . . . . . . 138Figure 6-13. 4-16 demultiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Figure 6-14. Serdes I receiver layout artwork and photograph . . . . . . . . . . . . . . . . . . . 143Figure 6-15. Serdes I receiver locked to data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144Figure 6-16. Serdes I recovered clock showing ji tter. . . . . . . . . . . . . . . . . . . . . . . . . . 145Figure 6-17. Serdes II Rx locked to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146Figure 6-18. Serdes II receiver clock phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Figure 6-19. Revised 4-to-16 demultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Figure A-1.Ic-Vbe characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Figure A-2.npn transconductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Figure A-3.Ic-Vce characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Figure A-4.fT vs Ic characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Figure B-1.Current switching versus differential input voltage . . . . . . . . . . . . . . . . . . 160Figure B-2.Simple CML Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Figure B-3.Reference Voltage Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Figure B-4.CML Buffer with emitter followers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Figure C-1.Linearizing differential amplifier with emitter resistors . . . . . . . . . . . . . . . 164Figure C-2.Branch current response for various emitter resistors . . . . . . . . . . . . . . . . . 165Figure C-3.Simulated / Analytical Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165Figure C-4.Limiting full current switching with bypass resistors . . . . . . . . . . . . . . . . . 166Figure C-5.Current limiti ng effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . 167Figure C-6.Current gain effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Figure C-7.Designing for gain with emitter and bypass resistors . . . . . . . . . . . . . . . . . 170Figure C-8.Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170Figure C-9.Delay Model with Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171Figure D-1.Delay from emitter follow to differential ampli fier . . . . . . . . . . . . . . . . . . 173Figure D-2.Delay from differential amp to emitter follower . . . . . . . . . . . . . . . . . . . . . 174Figure D-3.Emitter follower size between driver and receiver . . . . . . . . . . . . . . . . . . . 175Figure D-4.Delay when using optimized emitter follower . . . . . . . . . . . . . . . . . . . . . . 176Figure D-5.Delay difference between circuit with follower and one without . . . . . . . . 177
xi
List of Tables
Table 1-1. Equipment used for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Table 4-1. Circuit parameters for calculating jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Table 5-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Table 5-2. Bondpad pin-out of Serdes II chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Table 6-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
xii
Acknowledgements
First and foremost, I want to thank my family. Although they have little knowlege of
the research I have done, they have helped more than they know. Without them this would
have been a much more difficult undertaking.
I want to thank my advisor, Jack McDonald, for his assistance and guidance during
the past few years, and for providing me with the oppurtunity to work with cutting edge
SiGe technology. The members of my committee, Kenneth Connor, Gary Saulnier, Les
Rubenfeld, and Don Mill ard, also deserve thanks for providing insight and guidance in my
research. I would like to extend a special thank you to Dr. Mil lard for being a wonderful
mentor and friend since I began graduate school. He has always been there for me.
Also, without my fellow Frisc members and friends, Pete Curran, Samuel Steidl,
Matthew Ernest, Steven Carlough, and Bryan Goda, this certainly would have been a
boring voyage. Thanks for help.
I am indebted to Hank Dardy and Basil Decina at NRL (contract #N00173-99-1-
G013) for their support in this work. I also wish to thank Sierra Monolithics Incorporated
and IBM for the fabrication of my chip designs and for providing additional insight in this
research; and Intel, for providing a fellowship to support this work.
When I left high school I said, “ I’ ve just conquered a small hill i n my li fe only to look
out and see a huge range of mountains before me.” Am I perhaps standing on the top of the
first mountain I saw?
xiii
Abstract
The current high-growth nature of digital communications demands higher speed
serial communication circuits. Present day technologies barely manage to keep up with this
demand, and new techniques are required to ensure that serial communication can continue
to expand and grow.
The goal of this work was to research, design, implement, test and evaluate high
speed serial communication circuits. Research involved an in-depth study of the state of the
art in high speed digital and analog circuits; SiGe technology; and serial communication
circuits. Two prototype 20 Gb/s transceiver chips were designed using current mode logic
(CML) bipolar logic families and using IBM’s SiGe 0.5 µm heterojunction bipolar
transistor (HBT) technology. Following fabrication of two designs, the completed chips
were extensively tested, and test results were compared to expected results from
simulation. After optimization and many improvements, a prototype communication
system was designed and prepared for fabrication.
The optimized second prototype operated at speeds in excess of 20 Gb/s. It utili zed a
novel four stage feed-forward interpolated ring voltage controlled oscill ator (VCO)
architecture, for which RPI is pursuing a patent. By feed-forwarding every stage’s output
by one stage the architecture improved the core frequency by greater then 33% with a phase
noise of -90.2 dBc/Hz at 1 MHz. The transmitter took advantage of the phase quadrature
nature of the VCO in a unique multiplexing technique that required the development of a
new 2-to-1 multiplexer. This multiplexer had full input to output symmetry on all three
inputs and was capable of performing output data retiming. The PLL had a wide bandwidth
of 30 MHz, to suppress VCO noise, and produced in-band jit ter of 2.0 ps from 100 kHz to
100 MHz.
The receiver, similar in both prototypes utili zed the full eight phases of the VCO to
twice oversampling every data bit in the phase detector (PD). It was capable of extracting
timing information from every rising and falli ng transition. The loop filter incorporated a
xiv
negative impedance charge pump integrator which exhibited excellent performance. Four
bits of data were sampled through the PD and a 4-to-16 demultiplexer produced the 16 bits
of parallel data.
A third prototype was developed, but not fabricated, using the data acquired from the
first two designs. The transmit PLL bandwidth was optimized to account for the phase
noise measurements of the VCO. As a result, a frequency detector was required and added
to the PLL to increase the pull -in range. The loop filter was also modified to use the
negative impedance charge pump from the receiver PLL. The receiver demultiplexer
scheme was improved to decrease the timing constraints. In addition, the receiver PLL was
optimized to improve the bit error rate.
1
1Introduction & Historical Review
1.1. Motivation and Goals
The research presented in this thesis deals with understanding and designing the
critical components that make up a serializing and deserializing, or Serdes, circuit. The
extremely complicated nature of such a system required a focused study that did not address
many of the issues that are present in a similar commerciall y designed product.
Funding for the project was acquired though Dr. Jack McDonald from the Naval
Research Lab, NRL. The requirements were to design a SiGe short-haul Serdes system
capable of 20 Gb/s that would assist in research that may eventually lead to 40 Gb/s.
Serdes circuits, discussed more thoroughly in the following chapter, consists of three
parts: a transmitter, a receiver, and a channel. The transmitter accepts streams of data in
parallel and multiplexes them together into a single serial stream. Distinguishing the bits at
the receiver input, after they travel through the channel, is a primary concern. The receiver
accepts the serial stream and demultiplexes it back to the parallel data. It must be sensitive
to changes in the data, in order to limit the error rate. The channel connects the transmitter
and receiver, and typically consists of ampli fiers, repeaters, and optical wiring.
IBM’s SiGe HBT process technology was chosen because of the Frisc group’s
strength in high-speed bipolar design, and because of the state-of-the-art nature of the
process in the industry. The process provides integration with current CMOS technology
enabling a very wide variety of circuit topologies. This research used the 5 HP process
technology, with 5 levels of metal. It offered 50 GHz fT (transition frequency) HBT and
0.25 µm CMOS transistors.
One way of grouping Serdes circuits is by the distance over which the serialized data
is expected to travel. Systems, such as Synchronous Optical Network (SONET), are
implemented over distances greater than 100 km, and are considered long-haul. Short-haul
Serdes, on the other hand, is limited to short distances, such as a LAN, or between CPUs in
a multi-processor system. This distinction between short and long haul systems has
2
important implications on the criti cal specifications of the circuit. For long-haul systems,
phase noise is critical, as it dictates the total bit error rate (BER) through the long and noisy
channel. Short-haul is less sensitive to phase noise and is instead focused on bit throughput
and higher bandwidth.
Current industry level Serdes designs, as of the year 2000, run at 10 Gb/s and util ize
the same or similar 5 HP technology. Pushing the goal to 20 Gb/s and even 40 Gb/s was
intended to place this research on the cutting edge and evaluate the maximum potential of
the technology.
In addition to the goals of the NRL contract, various other factors motivated the
development of this project. First was the available test equipment. The lack of faciliti es to
test a packaged part necessitated a chip with wafer probing capabilit ies. This limited the
total testable signals to 12 RF and 12 DC at one time. Without packaging, a fully integrated
solution was necessary, rather than one that needed off chip components, such as capacitors
and op-amps.
1.2. The three chips
The total design process consisted of three separate designs. The fi rst design, Serdes
I, was a prototype that tested some of the key components of a complete design. It was
fabricated in February 1999. This chip was an excellent starting point for the development
of a fully functional chip.
Serdes II was investigated and studied after the results from Serdes I were analyzed.
It possessed improvements in important areas such as the PLL, the multiplexer, the receiver
topology, and the VCO. Unfortunately the tape-out date was earlier than expected and
allowed only one month for final design and layout. This proved to be a difficult time line
and some design issues were left unresolved.
Following the collection of data from Serdes II , a third iteration, Serdes III , was
investigated. The design goal was to solve most of the issues uncovered from Serdes I and
Serdes II . Although no new layout was done for Serdes III, a complete set of new simulated
schematics were created. With the addition of some minor support circuits, a fully
functional and optimized Serdes chip could be implemented.
3
1.3. Project time line
Figure 1-1 Past and proposed future researchThis is a time line of the goals and accomplishments of this Serdesresearch.
A time line indicating completed goals is shown in Fig. 1-1. Research into high speed
communication circuits was initiated in August 1998. A paper that appeared in ISSCC
1998, titled “A 10 Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission” [1],
was the basis for the majority of the research. The paper presented a novel idea for a voltage
controlled oscillator, VCO, and a description of a transmitter and receiver circuit.
VCOs are the most important circuit in the design of communication circuits, and as
such, were the starting point for this research. A simple four phase buffer oscil lator was
Start of researchPaper search
Transmitterdesigned
Leap VCOSimple VCO
Receiver designedFinal checks
Serdes IDesign dubmitted
Candidacypreparation
Chips receivedTest VCOs
Candidacy prep.Additional simulations
Test transmitterTest receiverSubmit to ISSCCCandidacy
Start work on Serdes II
SymMux patentSMI offer to fabricateIntense effort to designSerdes II
Serdes II receivedTest FFI VCOTest transmitterTest receiver
Nov, 19
98
Aug, 199
8
Feb, 1
999
May, 1
999
Aug, 199
9
Nov, 19
99
Nov, 19
99
Feb, 2
000
May
, 200
0
Aug, 200
0
Sept,
2000
Submit Serdes II FFI VCO patent
Both patents pendingComplete thesisSubmit JSSC paperDefend thesis
4
designed and simulated. The method for frequency control for this oscill ator originated
from a modified version of Samuel Steidl’ s VCO implementation [2]. An advanced version
of this VCO, with a 66% speed improvement, was subsequently implemented. The desire
to further increase the frequency led to a study of a phase multiplication techniques [3], [4].
Three separate VCO test chips were laid out to test various aspects of the above techniques.
Each chip contained serveral versions of a unique VCO design: with and without phase
multiplication, and under several different loading conditions.
In November 1998, the transmitter circuit started to take shape. One component of a
serializing circuit is the final multiplexer. To design this, a unique register “shuffling”
method was evaluated. As it provided better performance than other techniques and worked
with a slower rate multi -phase clock, it was chosen for the final design. In order to test the
transmitter, a linear feedback shift register, LFSR, was used to provide pseudo-random
data. An additional requirement of the transmitter was operation at a speed relative to a
fixed low frequency clock. This required the development of a phase locked loop, PLL,
capable of synching a low frequency external reference clock to the high rate internal clock.
Starting in December and during transmitter development, a receiver design was
examined. Many improvements were added to the fundamental architecture found in [1].
Instead of gathering timing data from every fourth transition, it was determined that better
performance could be achieved if every transition were used. Since no detailed mechanism
for feedback control was described, some ideas were gathered from a clock and data
recovery paper [5]. Starting with these ideas, a unique PLL was created for clock recovery.
Because of the difficulty of using external function generators, an internal testing source
was developed to provide different bit patterns to exercise the circuit completely.
All six chips, including an integrated transmitter/receiver chip, were designed and
laid out using Cadence software. Simulation was done using HSpice, Matlab, and a digital
simulator developed by Peter F. Curran. Final designs were shipped to IBM during the first
week of February 1999. After six months in fabrication, a finished wafer was returned to
RPI in the beginning of August of the same year.
Chip testing began with a detailed study of the three VCO chips and the test source
VCO in the receiver. It was became apparent that most of the circuits underperformed,
when compared to simulation results. It appeared that under heavily loaded conditions the
5
circuits slowed down more than expected. The transmitter test chip was tested and found to
work with a 25% reduction in frequency. This testing was followed by a detailed inspection
of the receiver chip, which was found to work nearly at the design speed.
During this time, data was being collected for a conference paper to be submitted to
the International Solid State Circuits Conference, ISSCC. Although the chips performed
slightly slower than anticipated, the paper still showed significant advances in state of the
art research. Unfortunately the paper was not accepted, most likely because there was a
frequency mismatch between the transmitter and receiver.
During the remainder of September, a thorough simulation of the VCO, including
layout parasitics, was performed. The initial results showed a close match to the results
measured from the fabricated wafer. Some discrepancy remains regarding how loading
affects the speed of the devices. A continuation of this work will attempt to match
simulations accurately to measured results to ensure that future designs will respond as
expected.
It was necessary to produce a second Serdes chip, drawing on the success of the of
the first test chip, that would meet the goal of a 20 Gb/s. Additional circuitry was needed
to round out the design: a 4-to-16 demultiplexer, an internal testing scheme, transmitter and
receiver integration onto one chip, packagabilit y, and improved performance.
A comprehensive study was performed to determine exactly why and how the chips
underperformed. The design was modified to ensure that the parts would meet the required
specifications. This included complete redesign of the VCO into the Feed Forward
Interpolated VCO (FFI VCO). The new design was based upon the results of the previous
design and the development of a new multiplexer.
In February 2000, an invention disclosure record entitled “The Symmetric
Multiplexer,” was submitted to RPI [6]. The invention improved the standard CML
multiplexer and reduced phase noise and jitter at the transmitter output.
Serdes 2 was finished and submitted to Sierra Monolithics Incorporated, SMI, for
fabrication1 at the end of March 2000. It contained many improvements on the previous
design and was capable of being C4 packaged and wafer tested. After its completion, an
1. SMI volunteered silicon on an experimental run.
6
additional invention disclosure record that focused on the FFI VCO was submitted [7]. The
VCO is a novel approach to designing ring oscill ators. It improves upon many key
parameters of the standard ring VCO.
The Serdes II chip was received three months after tapeout, in the middle of July
2000. Testing began immediately with a complete characterization of the FFI VCO
including its frequency response, CMRR, phase noise, supply response, and ji tter. A high
quality spectrum analyzer was rented to aid in testing and data acquisition. Testing of the
transmitter was followed by a look at clock jitter and data eye diagrams. The transmitter
was a complete success, and operated at 20 Gb/s with rms jitter of 2.0 ps in the frequency
band of 100 kHz to 100 MHz. The symmetric multiplexer appeared to work exactly as
expected. Testing the receiver confirmed an anticipated problem with low lock-in range.
This was also seen in Serdes I and was not completely addressed in the second prototype.
Following the tape-out of Serdes II, intense work was done on Serdes II I. Several last
minute problems were discovered in Serdes II that were corrected in the next iteration. Data
collected from Serdes II allowed the optimization of important PLL parameters in order to
reduce jitter, and improve the pull -in time. A problem with a small pull -in range in both
receiver PLLs required a complete redesign of the loop and the addition of a reference
signal.
Using the data collected in Serdes II , a journal article was submitted to the Journal of
Solid-State Circuits, JSSC, in October. It was titled “A Transmitter Architecture for High
Speed Short-Haul Serial Communication,” and it detailed the FFI VCO, the symmetric
multiplexer and the transmitter architecture.
At the end of September, the RPI patent office reported that they were going to pursue
U.S. patents for both inventions. This would start with an immediate application for
provisional patents that would protect the work after disclosure.
1.4. State of the Art
In the quick-paced research area of high speed communications, industry is currently
cresting the 10 Gb/s barrier while research is beginning in the 40 Gb/s regime. New
microelectronic technologies such as AlInAs/InGaAs heterojunction bipolar transistors
7
(HBT), and SiGe HBTs [8], [9] are playing leading roles. In particular, SiGe HBT and
CMOS technology is proving itself to be a high-speed (60-90 GHz fT), high-yield, high-
integration, and low-cost solution [10], [11]. It possesses the strengths of sili con because of
similar fabrication techniques, but benefits from higher frequencies with the introduction
of germanium [12].
The current state of the art in high-speed serial communications can be broken down
in three basic design areas: VCOs; clock multiplier units (CMU), or transmitters; and clock
and data recovery (CDR) circuits, or receivers.
As the speed of serial communication circuits increases, so too must the speed of the
core building block of the circuit, the VCO. Multi -phase ring oscillators with top speeds
approximately equal to 1/10th of their technology’s fT are being improved [1], [13], [14].
It is common to see speeds around 5 GHz, with maximum quoted speeds up to
approximately 15 GHz through clock phase multiplication [3], [4]. Their Q of unity and
high noise characteristics are more suitable for short-haul systems or for systems that can
tolerate phase noise. In-depth analysis of the sources of phase noise are allowing tight
optimization of circuits [15]-[19]. CMOS differential ring oscill ators running at speeds up
to 5 GHz exhibit -95 dBc/Hz of phase noise at 1 MHz [18], while bipolar rings are quoted
as having phase noise values of -86 dBc/Hz at 1 MHz [20]. Jitter, generally expressed by
the κ constant, has been documented for a sili con bipolar ring running at 625 MHz with a
0.6 mA tail current at 22 n [17].
Ring oscil lator architecture is straight forward and simple to understand. Through
interesting and creative interstage feedback techniques, the VCO frequency, and phase
noise can be improved. A four stage ring VCO that increases its speed by 33% by leap-
frogging the output of one stage to the input of the stage ahead is documented in [1]. This
improves the speed by reducing the effective delay of every stage. A similar, more general
approach is presented in [13], which utili zes sub-feedback inverters that create fast and
slow loops which can be mixed together. An earlier approach, [23], has a five stage core
that potentiometrically mixes the output from the third and fifth stages. By doing this, the
ring is able to operate variably between a 3 stage and a 5 stage oscillator. Finall y, by using
a negative skewed delay scheme, the core frequency of a CMOS ring oscill ator is improved
by 50% [24]. This is accomplished by compensating for the slower PMOS transistors by
s
8
tying the PMOS input to the output of a stage two gates back. This turns the transistor on
sooner than the NMOS, thus improving its speed at the expense of additional power
requirements.
LC oscill ators, on the other hand, which posses a high Q and extremely low noise and
ji tter, are being rigorously researched as VCOs for long-haul serial communication. Unlike
multi-phase oscill ators that can generate frequencies higher than their core frequencies, LC
oscill ators are typically run at the baud rate of the communication channel. Thus, for a 10
Gb/s serdes implementation, a 10 GHz LC VCO is required. A 5 GHz VCO developed by
IBM [21] was quoted as having a phase noise of -98 dBc/Hz at 100 kHz, with a power of
15 mW. A second 11 GHz VCO with an integrated inductor is documented as having a -78
to -87 dBc/Hz phase noise at a 100 kHz offset from the carrier [22].
The state-of-the-art in transmitter, or CMU, research is measured primarily by the
maximum bit rate compared to the transistor technology, the clock jitter produced at that
rate, and the phase noise of the oscill ator.
A 1.062 Gb/s transmitter implementation, [26], utili zes a half-rate ring oscillator. The
ring oscillator incorporates two mixing elements, between every pair of delay elements to
control the rate of oscil lation. Its quadrature outputs are further broken up into four quarter-
rate signals that drive the 10-to-1 multiplexer. The PLL achieves an rms jitter performance
of 9.8 ps.
A low noise, 12.5 Gb/s CMU is described in [27]. It possesses a differential single
phase LC oscill ator with a phase noise of -101 dBc/Hz at 1 MHz. The PLL has a very low
bandwidth of 300 kHz in order to reduce in-band noise. Its reference is at approximately
195.3 MHz and it utili zes a standard 3-state phase detector (PD). The loop filter consists of
a negative impedance ampli fier and a single pole, single zero RC filter. The output jitter is
quoted as 0.4 ps.
An interesting non-optical transceiver described in [28] utili zes a 4-PAM (pulse
amplitude modulation) serial li nk for 8 Gb/s communications. It essentially transmits and
receives four level logic, which allows twice the symbol rate for the same bandwidth. It
exhibits a transmitter output jitter of 2 ps and a receiver ji tter of 4 ps.
As bit rates are pushed higher relative to the transistor technology speed, certain
problems arise. In the transmitter PLL, a clock frequency divider is needed to drive the PD
9
along with the reference signal, and to drive multiplexer inputs. A feedback MS-latch often
does the trick, but for extremely high VCO speeds a new approach is required. A dynamic
frequency divider capable of speeds up to 79 GHz using transistors with an f T of 80 GHz
is described in [29]. It uses an XOR multiplier, a low pass filter inherent in the multiplier,
and it feeds the output back into the multiplier. The only stable condition is when the output
is at half the frequency of the input.
The state-of-the art in receiver, or CDR, design is measured by the ability to extract
data in the presence of both data and clock jitter, and the ability to tolerate pseudo-random
data.
The design described in [30] uses a full rate ring oscill ator with a 12.5 GHz clock to
extract the 8B/10B encoded data at 10 Gb/s. The VCO exhibits a phase noise of
approximately -80 dBc/Hz at 1 MHz. The PLL has a bang-bang PD and is frequency locked
by a 195.3 MHz reference signal. The data PD has a pull -in range of 0.6% and a hold-in
range of 1.2%. This receiver is quoted as exceeding the SONET-192 specifications by 50%.
A 50 GHz fT SiGe 10 Gb/s CDR for SONET is described in [31]. It utili zes an LC
tank VCO running at 10 GHz with a phase noise of -80 dBc/Hz at 100 kHz. The PD is a
Hogge type, and the charge pump uses an active MOSFET positive-feedback pull -up
ampli fier. The recovered clock rms jitter was measured at less than 1 ps, with a bit error
rate of 10-9. SONET specifications for jitter tolerance, jitter transfer, and jitter generation
were all met.
A very high speed CDR discussed in [32] uses a sil icon bipolar process with an fT of
12 GHz for 8 Gb/s operation. The loop filter and VCO are off-chip but the frequency and
PD are both on-chip. The clock jitter was measured at 1.5 ps rms.
1.5. Contribution to the Field
An important aspect of Ph.D. research is advancement of the state of the art, and
proving that such work builds upon the shoulders of others and is not merely a reinvention
of the wheel. Four key components of this research can be quickly singled out as original
and novel, and RPI is pursuing U.S. patents for two of them.
10
1.5.1. Feed Forward Interpolated VCO
The Feed Forward Interpolated VCO is an improvement over the standard ring
oscill ator [1]. The ring VCO in [23] utili zes a similar feed-forward method to extend the
frequency range but the feed-forwarding remains fixed and is not used as the delay control
mechanism. The design presented in this thesis, however, uses feed-forwarding to increase
the frequency range and also as the primary method to control the stage delay. It is versatile
and allows adjustments to be made to the center frequency, tuning range, and gain through
simple parameter changes. The VCO is 33% faster than a simple four stage ring oscillator
utilizing the same power, when it is configured for maximum operating speed. This
increase in speed can be traded for additional phase noise and jitter suppression, making the
FFI VCO a viable alternative to LC tanks when used in a short-haul communication
channel.
An invention disclosure record for this circuit was submitted in May 2000 to the RPI
patent office. In September 2000, the patent office declared that they were going to pursue
a U.S. patent for this invention.
1.5.2. Transmitter Interleaving Architecture
As the bit rate is pushed higher, with respect to the technology speed, it becomes
increasingly difficult to design VCOs that can keep up. Fractional rate oscill ators can solve
this diff iculty, but require tight timing constraints on the output multiplexer. The
transmitter design discussed in this thesis utili zes a relatively slow, well understood,
quarter frequency multi-phase VCO. The novel transmitter architecture allows in-
quadrature phases of the VCO to control a 4-to-1 multiplexer.
Although this approach is similar to the design given in [1], it possesses a few
differences. First, the 4-to-1 multiplexer is implemented as a single gate whereas the
transmitter interleaving architecture breaks the problem into multiple gates. Second, the
multiplexer requires multiple level clock inputs which requires the clock phases to be
skewed. Third, the multiplexer in the papter requires three levels of logic while this new
architecture requires only two. This is important for power saving applications that require
only two levels.
11
1.5.3. Symmetric Multiplexer
During the development of the transmitter a problem developed that required the
basic 2-to-1 multiplexer to be rethought. The problem was that the 2-to-1 multiplexer had
become a criti cal timing path in the transmitter. In other words, any delay mismatches in
this circuit were propagated to the output. After analyzing the problem, a new multiplexer
was developed that had perfect timing symmetry and possessed none of the problems of the
original multiplexer. This discovery enabled the new architecture to operate smoothly. A
U.S. patent for the symmetric multiplexer, li ke the FFI VCO, is being pursued by the RPI
patent off ice.
1.5.4. Receiver PLL
The critical circuit in the design of the receiver PLL was the phase detector (PD).
Typically, a Hogge-type [31], [52] or a bang-bang type PD [30] is used in high speed serial
receivers. The 20 Gb/s goal of this work required a PD to operate twice as fast using the
same technology speed. A bang-bang or Hogge style PD with this speed capabilit y would
be difficult to design and would require a clock at the same frequency as the data. As a
result, a new PD had to be developed.
The new design, called a transition detector (TD), incorporates eight MS-latches,
each clocked by a different phase of the VCO. This allowed the data to be twice
oversampled and timing and information data to be coll ected.
1.6. SiGe 5 HP Overview
IBM’s 5 HP SiGe BiCMOS process incorporates 0.5 µm HBT transistors and 0.35
µm CMOS transistors. The epitaxially graded Ge base in the HBT allows fT speeds of up
to 60 GHz. Also included in the technology are: high breakdown NPN transistors, gated
lateral PNP transistors, polysili con resistors, Metal-Insulator-Metal (MIM) capacitors,
substrate contacts, precision oxide/nitride decoupling capacitors, schottky barrier diodes,
varactor diodes, PIN diodes, electro-static discharge (ESD) devices, last metal (LM) spiral
inductors, resistors (NS, RN, and RI), and LM bondpads.
12
Between three and five layers of metal are provided at the back end of the line for
interconnect1. The first level of metal is for local interconnect and has a minimum width of
0.8 µm and a fixed thickness of 0.63 µm. The last, or highest level, called LM has a
minimum width of 2.4 µm, and a thickness of 2.07 µm. LM is typically used for bond and
C4 pads, power and ground wiring, inductors, and MIM capacitors. An extension to the 5
HP process allows LM to be substituted with analog metal (AM) which is 4 µm thick and
separated by 3 µm from the next layer of metal. AM is primarily used for inductors which
require low resistance and low capacitance to the substrate. Except for AM, all l ayers of
metal are separated by 1.2 µm of sili con dioxide.
The Cadence design kit from IBM provides full Spectre and HSpice models for the
devices listed above. The kit allows the extraction of interconnect capacitance and
resistance to enable full parasitic simulation.
See “ IBM SiGe 5 HP” on page156. describes important NPN HBT parameters in
more detail . Appendix A.1. describes the turn on characteristics of the transistor,
specifically the collector current versus base-emitter voltage. The relationship between the
collector current and the collector to emitter voltage is discussed in Appendix A.2. fT is a
figure of merit for the transistor family and its relation to the collector current is useful
when biasing the transistor for maximum performance. A plot of the transistor fT versus
collector current can be found in Appendi xA.3.
1. Serdes I was submitted in a DARPA multi-user wafer which only allowed three levels of metal. Serdes IIwas submitted through Sierra Monolithics and had the full five levels of metal.
13
1.7. Testing Equipment
1.8. Document Logistics
This thesis is sectioned into an abstract, six chapters, a conclusion, and appendices.
This introduction is the first chapter; it describes the goals and motivations behind this
project and discusses the state-of-the-art, the novelty of this work, and the test equipment.
The second chapter goes through the basic block diagram of a serial communication system
and the function of each block. Chapters three and four detail the development and results
of the two VCOs researched in this work. Chapter five details the transmitter, including the
Table 1-1 Equipment used for testing
Type Model Specs Usage
time-domain oscill oscope
Tek-tronix
11801C
50 GHz • transmitter eye diagrams
• time-domain jitter measurements
spectrum analyzer
Rhode &
SchwarzFSEM
30
30 Hz -26.5 GHz
• VCO frequency response
• VCO common mode response
• VCO frequency versus power supply
• VCO phase noise
spectrum analyzer
HP8563E
30 Hz -26.5 GHz
• Transmitter PLL phase noise
• Receiver PLL phase noise
signal source
HP4430B
< 1 GHz • Low phase noise jitter measurements
signal source
HP8350B
< 10 GHz
• High frequency receiver measurements
power sup-ply
AgilentE3631A
3 ch. DC • Labview controlled VCO frequency and sup-ply response
10 channel RF probes
GGB > 1 GHz • All high speed RF measurements where made using these probes.
12 channelDC probes
GGB < 1 GHz • These probes were used in Serdes II for sim-ple control lines.
LabView & GPIB
• Labview and GPIB hardware simplified the collecting of most data, including VCO phase noise and responses.
14
PLL, architecture, and test structures. The last chapter discusses the receiver, its operation,
and test results. Appendices include information on the SiGe process used in this work, and
circuit details of this technology. In addition the last appendix has the top level schematics
for the Serdes I and II chips.
Three different Serdes designs were researched in this work. The first two were
fabricated and the third represents research for the future. Each design is designated by the
names Serdes I, Serdes II, or Serdes III.
Certain conventions were followed throughout this document. First, node names in
schematics and within equations are in bold font, such as z20 and a11. Second, equation
variables are italicized, as in fo, and ω2. Third, in plots that contain both simulated and
measured data, the simulated data is usually expressed as a dotted line and the measured
data line is solid. Fourth, for equations solved for the general case the units are usually
expressed as a function of the transistor size. This shows how the constants and variables
change depending on the transistor size. In contrast, absolute units were used for specific
circuits and fabricated circuits.
15
2Serial Communication
The exchange of high speed serial data involves three primary components:
transmitter, receiver, and transport channel. A transmitter (Tx) gathers low rate parallel
data and transforms it into high speed serial data. The signal is then transported through the
channel, potentially air, or wire, to a receiver. The receiver (Rx) must then demodulate the
signal and extract the clock and demultiplex the data. The received information is fed out
of the receiver as parallel data.
2.1. Serial Communication Block Diagram
Figure 2-1 Toplevel System Block DiagramThe transmitter accepts parallel data and seriali zes it to a NRZ signal.The receiver accepts the bit stream, extracts the clock and demultiplexesthe data.
clocktree
enco
din
g
linedriverretimer
mu
ltip
lexe
r
reg
iste
rs
linedriver
sup
po
rtci
rcu
its
sup
po
rtci
rcu
its
inte
rnal
test
ing
inte
rnal
test
ing
reference clock
DA
TA
IN
DA
TA
OU
T
Transmitter Receiver
TxVCO
TxPLL
RxVCO
RxPLL
reference clock
tran
spo
rtch
ann
el
clocktree
reg
iste
rs
dec
od
e
dem
ux
l inereceiver
16
Shown above in Fig. 2-1 is a basic block diagram of a serial communication system.
Although most systems do not look exactly li ke this, there is enough in common between
this system and others to say that these diagrams represent all such systems fairly
accurately.
2.2. Transmitter / Multiplexer / Clock Multiplier
The transmitter’s role is to accept a data word of a specified width, serialize it and
drive the data onto a channel. The width of the word depends on the application and is a
function of the input and output bandwidths. For example, an 8 Gb/s serializer, would
require 16 bits at 500 Mbit/s or 64 bits at 125 Mbit/s. Serializing involves multiplexing the
data into an ordered bit stream which is typically a non-return-to-zero (NRZ) format. The
process of driving a channel may consist of a simple 50 Ω ampli fier, or it may consist of a
more sophisticated circuit that is capable of driving an optical driver.
It is possible, depending on the specifications, that the accepted data may be encoded.
The encoding process may include encryption, compression, bit stuff ing, error checking,
and framing [33]. Depending on the design of the receiver, it may be necessary to introduce
additional transitions into the data to meet critical phase locked loop (PLL) specif ications
in the receiver. 8B/10B encoding is popular and guarantees at least one transition every 5
bits [34]. If channel alignment, which means that bit 0 in the Tx comes out on bit 0 in the
Rx is required then encoding wil l be needed.
After possible encoding, the bits are stored in a register of appropriate size for the
incoming word and the multiplexer width. When the multiplexer is smaller than the width
of a word then the bits may be fed into a shif t-register before being multiplexed [35]. This
register and the subsequent multiplexer must be timed very carefully to ensure that bits are
sampled correctly and that no race or runt pulses exist. Sometimes a first-in first-out (FIFO)
system is added to lessen the timing constraints between the data load clock and the
reference clock.
The PLL clocks the multiplexer and the multiplexer performs the serialization
function. This operation may require multiple gates, such as a 32-4 multiplexer followed
by a 4-1 multiplexer, or simply a 16-1 multiplexer. Timing at this stage becomes more
17
critical as the output rate of the multiplexer is at the serial data rate. Often multiple clock
phases or clock frequencies are needed.
The retiming circuit before the line driver re-establishes the transition locations in
order to remove any jit ter or noise introduced by the registers and multiplexers [42]. This
circuit is clocked directly by the PLL to be as noiseless as possible. When low output jitter
is the limiti ng factor in the design, then a retiming circuit is absolutely required.
The retiming circuit, or multiplexer, is often unable to drive the pad and external load
directly, so a line driver is needed [36], [37]. It matches the internal circuitry impedance to
the output impedance and amplif ies the signal to a desirable voltage swing if necessary.
Perhaps the most important circuit in the transmitter is the PLL, otherwise known as
the frequency synthesizer or clock multiplier unit (CMU). It generates the internal clock
signals which may be multi -phase or multi-frequency. It’s required to have low phase
noise, low jitter, and low frequency drift to generate a similarly low phase noise data
stream. The transmitter PLL, as opposed to the receiver PLL usually has a very low
bandwidth in conjunction with a low phase noise VCO to generate the cleanest clock signal.
The PLL locks the phase of an internal high speed clock to an externally supplied low
speed reference. In this way the reference is able to dictate the exact frequency that data is
transmitted. For instance, a 10 Gb/s system may have a 625 MHz reference clock, and a 10
GHz internal clock. The PLL must then match the two frequencies after dividing the
internal clock by 1/16th.
The PLL consists of three basic components: a phase detector (PD), a loop filter (LF),
and a voltage controlled oscill ator (VCO). The PD generates a signal which is a function of
the phase difference between the divided down internal clock and the external reference. In
low speed applications such as this (625 MHz clock versus 10 GHz data rage) the PD can
generate an accurate, linear measure of phase difference. The LF typically consists of an
active filter with high DC gain which has a specific bandwidth and a high frequency pole.
With most of the other gains and parameters in the PLL fixed, the LF is the only circuit that
is adjustable to meet the specifications. The VCO accepts a voltage input and generates an
output signal which has a frequency that is a function of the input. Ideally this relationship
is linear which leads to closed-form linear solutions for the PLL.
18
One of the most important figures of merits for the transmitter is the output data jitter.
Jitter is created inside the VCO and partially filtered out by the PLL. The retiming circuit
and all circuits thereafter add slight jitter to the signal. The transmitter data eye closes
horizontally as more jitter is introduced into the circuit.
2.3. Transport Channel
The channel carries the data from the transmitter to the receiver, and may be
electrical, optical, wireless, or any combination of the three. For long-haul communication
the channel is a significant and sometimes dominant source of phase noise and jitter. For
short-haul communications, however, we assume that the channel is negligible.
2.4. Receiver / Demultiplexer / Clock & Data Recovery
The receiver must extract a clock from a very high frequency serial signal, plagued
with jit ter and noise and use that clock to sample the data. This process is called clock and
data recovery and is made more difficult because transition locations are not guaranteed.
A line ampli fier with a specific input impedance ampli fies the signal to internal levels
while minimizing the distortion. The ampli fier must have a large bandwidth, typically
about 50% higher than the baud rate. Noise injection from this circuit must be minimized
because the data signal is already saturated with ji tter. When an optical channel is used a
laser diode drives the receiver input and a transimpedance amplifier is required.
The receiver has a PLL that is very different from the PLL in the transmitter. First,
the PD must operate at or near the data rate, which requires a simpler circuit and one that
may only provide a non-linear output. The PD must also be able to handle random data that
has random transition locations, if the data is of the NRZ variety. In addition, the key PLL
parameters must be tuned to a signal with high noise content as compared to the PLL in the
transmitter which has a low noise reference as its input. Additional circuitry will be needed
to sample the data using the recovered clock unless the PD does so naturall y.
As in the case of transmitter, a reference clock may be used to bring the receiver VCO
close to the data frequency before clock extraction occurs. This greatly enhances the
operating range of the receiver PLL. The drawback is that two separate PDs and a circuit
19
that can switch between them is needed. This introduces two loops consisting of common
components which must be able to operate independently.
A common component in dual loop PLLs is a lock detect circuit which determines if
phase lock is lost and if it is, the loop switches back to the external reference loop. This
circuit is useful in a high noise environment where data jitter can cause the PLL to become
unstable. It also allows notification to the software layer to resend the lost data.
Once a clock has been extracted from the serial signal, and the data captured, the data
can then be demultiplexed through a series of samplers at decreasing clock rates. For
instance, in a 10 Gb/s system the first resampled data would pass through a 1-to-2
demultiplexer driven by a 5 GHz clock. The second stage would consist of two 1-to-2
demultiplexers driven by a 2.5 GHz clock and so on. If a multiphase clock is used, then
multiple samples can be taken with separate samplers. This allows the use a clock at a
fraction of the data bit rate.
One of the most important parameters in the design of the receiver PLL is its jitter
transfer function. This determines how sensitive the system is to data jitter. The PLL should
be able track low frequency jitter very well . In this case the jit ter transfer function should
be close to 0 dB. At high frequencies the transfer function should drop off in conjunction
with the bandwidth of the loop. Another important parameter is called ji tter peaking. This
parameter describes high frequency jitter components such as those from spurious
modulation. This is especially important in SONET repeaters that feed the receiver clock
back into a separate transmitter. A sequence of many repeaters are very sensitive to this
form of jitter.
After the data is fully demultiplexed down to the desired parallel data width it can be
decoded based upon the encoding scheme used in the transmitter. In some cases this also
involves channel framing which lines up transmitter input channel n with receiver output
channel n. Once the data is decoded it may, li ke the transmitter, be placed in a FIFO to
reduce the timing constraint on the data received clock.
20
2.5. Internal Testing
Internal testing involves performance verification of the transmitter and receiver
before and after being connected in a complete system. For a chip with both transmitter and
receiver components, this may involve a feedback path across the chip from the output of
the Tx to the input of the Rx. The parallel data from the Tx and Rx can then be compared
to determine the bit error rate (BER).
Additional testing modes may involve additional outputs that show the health of the
system [38]. Outputs may also be duplicated and fed to testing equipment while actual data
is being transmitted.
2.6. Support Circuits
Other circuitry may be needed in the system depending on the application. For
example, if a transmitter and receiver are required to operate at different fixed frequencies,
selectors and special input pins are required. Also, circuits within the chip may not be
needed all the time and in some cases a power managing system can cut-off power. This
option reduces overall power consumption but requires additional power-switching
circuits.
21
3Current Starving VCO
3.1. Project History
The Current Starving VCO (CS VCO) was used exclusively in the first serdes design,
which was fabricated in February 1999, in the transmitter, the receiver, and in various
oscill ator test structures. Its performance was suff icient but the design required some
revision to meet frequency specifications. Deficiencies and unpredictable behavior,
however, resulted in its elimination from all subsequent designs.
The feed forward version of the CS VCO was not intended for use in the transmitter
and receiver design. It was instead designed to push the upper frequency limit i n the ring
oscill ator design. However, it had the potential for use in future transmitter and receiver
designs in order to double the speed to 40 Gb/s.
3.2. The need for a VCO
PLLs, frequency locked loops (FLL), clock extractors, and frequency synthesizers all
require a voltage controlled oscill ator. These circuits create one or many signals with a
frequency that are a function of an external control voltage. In a PLL, or clock extractor, a
DC voltage is generated based upon the difference between the VCO signal and an external
signal. This voltage is then fed back into the VCO to create a stable phase feedback loop.
Frequency synthesizers incorporate frequency dividers to create signals of varying
frequencies based upon the VCO’s fixed frequency.
VCOs for Serdes circuits are usually either an LC (inductor, capacitor) oscillator or
ring oscil lator; each having benefits and drawbacks. All VCOs discussed in this section are
four stage ring oscill ators which produce eight unique phases when used with differential
Transmi tter Receiver
22
logic. The architecture of the receiver and transmitter requires this crucial multiple-phase
characteristic.
3.3. Simple Current Starving VCO
The Simple CS ring oscillator has four stages [39], shown in Fig. 3-1, and is able to
create eight unique phases. The frequency of oscill ation is defined by
where T is the delay through the gate. A factor of two is necessary, because after a signal
passes through four buffers it has only changed sign and requires another trip through all
four to oscillate. The frequency and gain response for this oscillator is shown in Fig. 3-2.
Figure 3-1 Four stage VCO diagramFrequency control is accomplished through variable delay elementsarranged in a ring with an odd number of inversions. The operatingfrequency range is a function of the delay element range and the numberof stages in the ring.
The schematic for the Simple CS stage is a buffer, described in Appendix B.4. on
page 162, with level two emitter followers. The differential circuit current source is
connected to the aVref circuit in order to control its current.
3.4. Basic Operation
Current starving VCOs control their frequency by varying the delay through each
stage of the ring. Each stage has a differential ampli fier with one or many adjustable current
sources at the bottom of the tree. In this way, the stage is able to increase its delay with a
decrease in current. This effect is a primarily a result of less current causing a decrease in
f1
2 4T⋅--------------= (3-1)
A
B
C
D
ΦΦΦΦAΦΦΦΦA
ΦΦΦΦB
ΦΦΦΦC
ΦΦΦΦDΦΦΦΦB
ΦΦΦΦC
ΦΦΦΦDΤΤΤΤ
23
the fT of the transistor, as shown in Appendix A.2. on page 158. Even though the smaller
current has less capacitor charging abilit y, the associated smaller voltage swing produces
no net effect in delay.
Figure 3-2 Current Starving VCO frequency and gain responseThe CS VCO’s usable frequency range is between a control voltage of -1.5V to -1.0V or higher. The lower range is limited by the small voltageswing on the output. These simulation results were obtained with oneminimally sized buffer on each stage’s output. Interconnect parasiticswere not included.
Even though current starving is a simple technique for controlli ng delay, it has
numerous disadvantages. The first obvious problem is that at the limits of operation and
control voltage, undesirable conditions occur. At the minimum extreme, the current can be
decreased to the point that sustained oscillations can no longer occur, because the voltage
swing decreases and the gain drops below one. At the maximum, the transistor fT begins to
drop off the opposite side of the fT curve and the transistors begin to slow. This is
potentially disastrous when used in a phase lock loop because the VCO gain has gone
negative and the loop will become unstable.
4.50
4.75
5.00
5.25
5.50
5.75
6.00
6.25
-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0Control Voltage (V)
Fre
quen
cy (
GH
z)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Gai
n (G
Hz/
V)
gain
frequencyresponse
24
Another problem is the that the delay as a function of current is non-li near in nature.
Fig. 3-2 shows the basic frequency response for the Simple CS VCO excluding
interconnect parasitic effects. The gain varies from 3.0 GHz/V to 0.5 GHz/V along the
curve and is never constant. A non-linear gain makes phase locked loops difficult to design.
The output voltage swing is also a concern because as the current increases, the voltage
swing across the pull -up resistors also increases. This alters the load driving abil ity, and
creates a situation which is difficult to model analytically.
Another problem is that the singled-ended nature of the control voltage does not
posses the common-mode noise immunity that is inherent in differential wiring. When
phase noise is a dominant design factor this architecture can be quite limiting.
The are benefits of this style of ring VCO, including its simplicity and a large tuning
range. The layout footprint is also quite small which minimizes interconnect delays.
3.4.1. Adjustable Voltage Reference
Figure 3-3 Adjustable Voltage ReferenceThe input voltage controls the total current through this circuit. In turnthis current is mirrored to all connected sources.
The active current sources in the CS stages are “mirrored” to a circuit that can vary
its current as a function of a single-ended input voltage, as depicted in Fig. 3-3. The current
through the reference circuit, and its derivative with respect to the control input is defined
Vctrl
aVref
Ir
R1
Re
Vee
25
by the following equations:
The emitter resistor, Re, is matched to the current sources emitter resistors so that the same
voltage exists across both. R1 determines the current gain of the circuit and the value is
selected based upon the input voltage swing, and the required output current swing. An
additional diode is added to decrease the voltage drop across R1 allowing a smaller resistor
size.
A common approach to designing a current mirror is to include base-current
compensation through a transistor located on the output (see Appendix B.3. on page161).
This allows the current reference to drive more loads and lessen the current degradation
when more loads are added. The problem with this approach is that it l imits the frequency
response of the circuit. For this reason it was not included in the design. The current driving
capabilit y of the circuit without base-current compensation should be sufficient to drive a
single VCO with an equivalent of 8 µm of loading.
3.4.2. Final Implementation
The development of the transmitter and receiver played a defining role in the design
of this VCO. To meet a goal of 20 Gb/s with a quarter-rate architecture, a VCO centered at
5 GHz was needed. A control voltage range from -0.8 V to -1.6 V was chosen because of
the solid transfer characteristics, and because those limits correspond to one and two Vbe
drops. At the center of the control range a frequency of 5.75 GHz was achieved,
corresponding to a 15% safety margin.1
Symmetry was the leading motivation behind the layout of the Simple CS VCO
shown in Fig. 3-4. The four stages were laid out in a square with the inputs and outputs
facing the center. In this way the interconnect between stages could be limited to a small
1. This safety margin was build in because parasitic simulations were not done prior to fabrication. It wasfelt that a greater then 10% margin would adequately account for interconnect effects.
Ir
Vee Vc tr l 3Vbe–+
R1 Re+---------------------------------------------=
dIr
Vc tr l------------ 1
R1 Re+------------------=
(3-2)
(3-3)
26
region in the center of the design. Power and ground rails, as well as the two reference rails
(aVref, Vref), were placed in closed concentric LM rings around the top.
Figure 3-4 Layout of Simple CS VCO Shown above is the layout for the Simple CS VCO. All inputs andoutputs face inward to minimize the effects of interconnect parasitics.Symmetry was the most important design requirement.
In addition to CS VCOs in the transmitter and receiver a separate test chip containing
CS VCOs was also made. This allowed a more straight forward measurement of the VCO’s
frequency and gain characteristics. This test chip also included an XOR phase multiplier
[3],[4],[20] tree in order to achieve frequencies double and quadruple the nominal 5 GHz.
The goal of the multipliers was only to see how high the technology could be pushed.
3.4.3. Testing Results
The plot in Fig. 3-5 shows the results from an ideal interconnect simulation, a
simulation with capacitive1 interconnect, and measured results from the fabricated circuits.
The 20% decrease in speed between the ideal simulation and the measured results is
1. The IBM 1999B SiGe design kit does not include interconnect resistances correctly and typically simu-lates with a faster response than with capacitance only. Resistance values are also very small and can beignored for these localized wires. For these reasons, only capacitance was included.
102
µ µµµm
27
immediately obvious. Unfortunately this was larger then the 15% safety margin and
resulted in a frequency range that did not meet the 5 GHz center frequency specification.
Between a control voltage of -1.6 and -1.4 the measured VCO tracked very closely
to expectations, but above -1.4 the VCO response becomes lethargic. This is li kely due to
too much current in the tree which is causing a reduction in fT faster then the model
predicts.
Figure 3-5 Test data from Simple CS VCOSimulation with and without interconnect parasitics, and measuredresults are shown in this plot. Measured results track closely with theparasitic simulation with low control voltages.
3.4.4. Optimization of Simple CS VCO (post-fabrication)
From Fig. 3-5 it is clear that the oscillator under performed and missed the 5 GHz
target. This can be directly attributed to initial simulations that did not include resistive and
capacitive interconnect parasitics. Although the layout footprint of the VCO is very small
and designed to minimize wire lengths, parasitics still presented a significant influence on
speed.
The receiver VCO has a frequency range of 4.25 GHz to almost 4.9 GHz. Because
20 Gb/s is the target data rate, we would like 5 GHz to fall in the middle of the operating
3.5
4.0
4.5
5.0
5.5
6.0
6.5
-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0
Control Voltage (V)
Fre
quen
cy (
GH
z)
Simulated
ParasiticsMeasured
28
range of both transmit and receive VCOs. Given that the initial design was slow how can it
be ensured that the next version will meet specifications? Can the measured and simulated
results be used to maximize the likelihood of a successful design?
Each of the four VCO stages must be loaded by an identical buffer which then drives
subsequent circuitry. By using the smallest transistors, 1 µm, in the buffers, the loading on
the VCO will be minimized and its operation will be maximized. Under such conditions the
easiest method for increasing frequency response is to increase the power of the delay
elements by using larger transistors. This has the immediate effect of reducing the effective
loading on each gate and increasing the frequency at a given control voltage. The devices
in the first design iteration had 2 µm emitter lengths and were slightly slow, so an increase
in emitter length should bring the VCO to within specifications. Fig. 3-6 shows the
relationship between frequency response and transistor size used in the delay stages of the
VCO. Because interconnect parasitic simulations require a complete layout this simulation
uses ideal interconnects. As suspected there is an increase in performance when larger
devices are used.
29
Figure 3-6 Frequency Response versus emitter length in delay elementsBy increasing the emitter lengths and keeping the loading the same, theeffective loading is decreased and the performance improves. Thissimulation does not include interconnect parasitics.
It can be seen that a relatively small increase in transistor size from 2 µm to 2.5 µm
achieves a 12% increase in speed at a control voltage of -1.5 V. The 2 µm and 2.5 µm delay
elements have an effective loading of 0.5 µm/µm and 0.4 µm/µm respectively, representing
a 20% decrease. Assuming that the interconnect parasitic effects stays the same or
decreases, the 2.5 µm delay elements should bring about a 12% increase in the VCO
response. From a range of 4.25 GHz to 4.9 GHz a 12% improvement yields a range of 4.76
GHz to 5.48 GHz, which is well within the specifications.
3.5. Current Starving with Feed Forwarding
Some advantages of the four phase simple VCO circuit include: symmetric phases
minimizing phase differences, generation of rising edges every 25 ps at 5 GHz, and a large
frequency range. The motivation for a new VCO design is to enhance the frequency beyond
the limits of this simple design.
4
5
6
7
8
9
10
-1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0
Control Signal (V)
Fre
quen
cy (
GH
z)
10u
6u
4u
3u
2.5u
2u
30
One method to do this is to use a delay cell that averages the signals from the last two
stages as shown in Fig. 3-7 [1],[13],[23], [24]. Stage C accepts inputs from stage B and
stage A, stage D accepts from C and B, and so on. The idea is that the average of the
previous two signals occurs earlier than just the previous signal.
Figure 3-7 Feed-forward CS VCO block diagramEach stage in the VCO receives signals from the previous stage and thestage preceding that one. Stage A can reali ze an effective decrease indelay by utilizing the signal from stage C. The inversions to induceoscillations are left out for clarity.
Mathematically, the nth element presents its output after the average of the n-1st and
n-2nd element outputs plus the delay of the nth element. Solving for difference between two
consecutive stages yields
which shows that the effective gate delay is reduced to two thirds from the intrinsic stage
delay, Ti. The intrinsic delay is defined as the delay of the stage if its inputs were tied
together and treated as a normal buffer.
A
B
C
D
ΦΦΦΦA
ΦΦΦΦB
ΦΦΦΦC
ΦΦΦΦD
ΦΦΦΦA ΦΦΦΦB ΦΦΦΦC
ΦΦΦΦA+ΦΦΦΦB
2222
stage delay
delay sav ings
tn
tn 1– tn 2–+
2---------------------------- Ti+=
tn tn 1–– 23---Ti=
(3-4)
(3-5)
31
Figure 3-8 Feed forward CS VCO frequency response and gainThe Feed Forward CS VCO was designed to achieve the highestfrequency possible. After optimization is operates at twice the speed ofthe Simple CS VCO.
3.5.1. Final Implementation
An important consideration in the design of the feed-forward delay element is its
higher complexity, having two inputs instead of one, which increases the delay. Also,
because there are twice as many wires between stages in the feed-forward design the layout
will be larger and more limited by interconnect parasitics. With this in mind, the most
simple averaging circuit was created that util ized a minimum number of additional
transistors and resistors. The final schematic is shown in Fig. 3-9.
A description of its operation is as follows: If Q2 and Q4 are on, Q1 and Q3 are off, and
signal b arrives first, then signal b will begin to turn Q3 on and Q4 off . This will start to draw
current through Rc1. If b were to completely switch then both Rc1 and Rc2 would carry the
same current: an undesirable condition in which the output is the average of a one and a
zero, which is undefined. The normal operating condition involves b partially switched
followed by the beginning of a switch in the a signal. When this occurs more current flows
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
-1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0
Control Voltage (V)
Fre
quen
cy (
GH
z)
-1
0
1
2
3
4
5
6
Gai
n (G
Hz/
V)
Frequency
Gain
32
through Rc1 and less current through Rc2. The effective switching input can be said to occur
between the two signals, a and j.
Figure 3-9 Feed-forward CS Delay ElementThis circuit operates by averaging the a and b inputs through commonpull -up resistors. The aVref node is varied in order to control the totalcurrent through the tree. Lower current corresponds to longer delay.
One important characteristic in the two current starving VCO circuits is the choice of
collector resistors which affects the output amplitude and the gate delay. An increase in
resistance causes an increase in amplitude and an increase in delay because the same
amount of current produces a larger voltage swing and a larger RC time delay. The simple
CS VCO was designed around an operating frequency of 5 GHz, so a resistance was chosen
so that there was a 200 mV - 400 mV swing around 5 GHz. The feed-forward CS VCO, on
the other hand, was designed to achieve the highest possible frequency response, so a
resistor small enough to maximize the frequency while leaving a 150 mV - 200 mV swing
was used. Fig. 3-8 shows the frequency response of the feed-forward CS VCO.
3.5.2. Testing results
The feed-forward CS VCO was not used in the first transmitter and receiver design
but was implemented in a test chip. It was configured with one load to achieve the smallest
loading effect and thus the highest frequency. The simulation and measured results are
plotted in Fig. 3-10.
Q1 Q2 Q3 Q4
a10 a11 b10 b11 z20z21
aVref
Vref
Rc1 Rc2
33
Figure 3-10 Testing Data from feed-forward CS VCOThe implementation of the Feed Forward Current Starving VCO onlyhad a single load in order to achieve the highest frequency possible. Themeasured results are only about 4% lower than simulations withinterconnect included.
Simulations with one load and no parasitics shows a peak frequency of 12 GHz. With
parasitics the frequency drops by 6% to 11 GHz which tracks very closely with the
measured results. The steep drop off of the measured results at the high end is li kely due to
a high collector current causing a drop off in the transistor fT that is not accurately
accounted for in the models1.
3.6. Conclusions and Future Work
The Current Starving VCOs presented in this section are compact and easy to
implement but they have some crucial deficiencies. Their performance was about 5% worse
than expected from simulations with interconnect parasitics. Feed forwarding allowed a
1. This is supported by information gathered at a meeting at IBM in 1999 concerning measured results fromthe DARPA 2 run. An IBM device modeler was quoted as saying that the fT curves drop off faster then themodels predict.
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0
Control Voltage (V)
Fre
quen
cy (
GH
z)
1 Load Simulated
4 Loads Simulated
1 Load w/ Parasitics
1 Load Measured
34
near doubling of speed at the expense of a slightly more complicated circuit. If
implemented correctly this additional speed could be traded off for a reduction of noise.
With an increase in power supplied to the VCO that was implemented, the desired
specifications should be achieved. However, future research into this VCO topology should
be limited because its response is difficult to model and it utili zes a delay strategy which is
poorly understood.
35
4Feed Forward Interpolated VCO
4.1. Project History
The Feed Forward Interpolated VCO evolved from the Current Starving Feed
Forward VCO and replaced all instances of that VCO in the second serdes chip in submitted
in March 2000. Additional test structures were added to further exercise this VCO, and an
invention disclosure record was submitted to RPI in May, 2000. An RPI provisional patent
was awarded in September 2000.
4.2. The Evolution
The evolution of the Feed Forward Interpolated, VCO (FFI VCO) began with the
Feed Forward Current Starving VCO (FFCS VCO) discussed in Chapter 3. Each stage of
the FFCS VCO averaged the output from the previous stage and the stage before that to
generate a signal with a smaller effective delay. The averaging was fixed and reduced the
delay by 66%.
A common approach in the design of a standard ring oscillator stage without feed
forwarding is to use delay interpolation as shown in Fig. 4-1. The idea is to split the input
signal into a slow and fast path and create a weighted sum of the two to form the output.
Common pull-up resistors, level 3 control inputs, and emitter resistors for linearity make
this possible. The slow path need only delay the signal longer than the fast path and a simple
capacitor can do the trick. The benefits of this VCO stage include a uniform output voltage
swing, a fairly linear response, no limits of operation, and easy minimum frequency control
through the capacitor.
Transmi tter Receiver
36
Figure 4-1 Schematic for Delay Interpolated VCO elementThis VCO element linearly interpolates, the input signal after travelingthrough a fast and slow path. The slow path is created with the additionof a capacitor.
The vision of the FFI VCO occurred when looking at the Delay Interpolated VCO
and realizing that the fast path could be the implemented as the signal from the stage before
the previous stage and the slow path could be from the previous stage. This insight
immediately eliminated the need for the slow path capacitor, and nearly doubled the speed
of the VCO.
The FFI VCO is a delay interpolated VCO with the normal and delayed signals
created from different stages rather than from within each stage. This forces each stage to
have two inputs rather than one and eliminates the need for the slow path capacitor. The
schematic for the FFI stage can be found in Fig.4-7 on page44.
4.3. Basic Operation
On a block diagram level, the FFI VCO looks identical to the Feed Forward Current
Starving VCO shown in Fig. 4-2. The difference is in the method used to control the delay
though each stage. The FFCS VCO controls delay by varying the current through its buffer
which is directly related to the delay through its gate. The feed forward technique simply
c30 c31
Re Re
Cs
z21z20
i20i21
37
reduces the effective gate delay by about 33%. The FFI VCO, on the other hand, linearly
interpolates the signals received from the previous two stages. The current, which remains
the same through the tree, is gradually shifted between the two inputs, p and l, as shown in
Fig. 4-7. The p (previous) input arrives from one stage back, and the l (leap) input arrives
from the stage prior to that. The two signals are weighted by the control signal and summed
by the common pull -up resistors. The final result is the frequency response shown in Fig.
4-4.
Figure 4-2 Feed Forward VCO block diagramEach stage in the VCO receives signals from the previous stage and thestage preceding that one. Stage A can reali ze an effective decrease indelay by utilizing the signal from stage C. (The inversions, to induceoscillations, are left out for clarity)
Figure 4-3 FFI VCO under boundary conditionsDiagram (a) shows the VCO running in the four stage configurationwith the control voltage set to a minimum value. Diagram (b) shows theVCO in the two stage configuration, at the maximum control voltage.
The minimum operating frequency is defined by the oscil lation of the system when
the leap signal is ignored, and only the previous signal is used. In this case, the system is
A
B
C
D
n
n-1
n-2
A
B
C
D
A
B
C
D
(a) (b)
38
running as a four stage oscill ator and has a frequency of about 3.9 GHz. When the control
voltage is switched in the other direction, the leap signal is used, and the previous stage’s
output is ignored. In this configuration the system is running as two separate two stage ring
oscill ators with a frequency of approximately 7.9 GHz. These two cases are depicted in Fig.
4-3. It is useful to look at the system in terms of an effective delay for all control voltages
between the minimum and maximum values.
Figure 4-4 Feed-forward interpolated simulated responseThe frequency response of the FFI VCO is linear across a large rangefrom 4.75 GHz to 7.00 GHz. System gain is flat across the operatingrange.
The effective delay of a stage is defined to be the delay of a stage in a four stage
oscill ator that has the same frequency as the feed forward oscillator. This parameter can be
found by setting the intrinsic delay of a stage to T, setting s equal to the weighting factor
between 0 and 1, and looking at the output transition times of stages n, n-1, and n-2. The
weighting factor is a constant that indicates how much of the leap signal is being used. Set
to 0 the ring acts as a normal 4 stage oscill ator, and set to 1 the ring acts as a 2 stage
oscill ator.
The edge time of stage n is given by
which is the intrinsic delay through the stage, plus the weighted sum of the previous two
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
Control Voltage (V)
Fre
quen
cy (
GH
z)
tn T stn 2– 1 s–( )tn 1–+ += (4-1)
39
stages. Solving for the time difference between two stages yields
which is the effective delay and the frequency of the VCO in terms of the effective and
intrinsic delay of each stage. The factor of eight is needed because it takes two complete
cycles through four stages to equal one period of the VCO.
For s equal to 0, the effective delay is equal to the intrinsic delay of the stage. At the
other extreme, when s equals 1, the effective delay is one half of the intrinsic delay. This
makes sense because the system in this configuration has two stages rather then four. (4-3)
also shows that in the Feed Forward CS VCO, where s is fixed at 0.5 has an effective delay
equal to as was shown previously.
The benefits of the FFI VCO are numerous and represents many improvements over
the previously discussed designs. The use of feed forward techniques allows the VCO to
exceed the maximum frequency achievable by a simple four stage ring oscill ator. This is
extremely important if a solid high speed eight phase VCO is required.
Fig. 4-4 shows a linear frequency range from -0.2 V to 0.2 V. This linear range is very
important when designing phase locked loops, because linearity results in simple closed
form solutions. In addition, this VCO has a response with an obvious center and with limi ts
approaching a asymptotic minimum or maximum. In contrast, the CS VCO will stop
operating below a certain frequency. Although a control voltage would never be driven to
such extreme values as to cause malfunction, this can happen in PLLs during power up.
Often an integrator, or capacitor that is never guaranteed to have a specif ic voltage, will be
attached to the VCO control inputs. If it has a poor initial condition, which is maintained
by a non-oscil lating VCO, then the system will become unstable. It is therefore important
to provide the largest control voltage range possible that will still allow the VCO to
oscill ate.
Current through the FFI stage is linearly switched between the previous and feed
forward stages. This forces the total current running through the stage to remain constant.
Teff tn tn 1––T
1 s+( )----------------= =
fvco1
8Teff
----------- 1 s+( )8T
----------------= =
(4-2)
(4-3)
2 3⁄( )T
40
This is important for keeping a constant voltage swing, which ensures consistent operation
in a system where a variation in voltage swing would cause a change in frequency. The
SNR is also dependent on the output voltage swing, which if varying, can complicate the
analysis. This is the problem encountered with the CS VCO described in Chapter 3.
Differential signaling is used for the control input and throughout the rest of this
design. This is crucial when designing for low noise operations since differential wires have
strong common-mode rejection.
One exciting feature of the FFI VCO, that will be examined in detail i n the next
section is the extraordinary capacity for customization of this circuit. First, by controlling
the linearity through emitter resistors, different frequency gains can be used. (Fig. 4-8)
Second, a capacitor at the top of the tree controls the center frequency point. (Fig. 4-9)
Third, resistors exist to limit the frequency range and prevent stage decoupling. (Fig. 4-10).
One minor drawback to this design is the slightly larger layout footprint. The cascode
ampli fiers introduce four addition transistors and if a large capacitor is necessary then a
large amount of space may be required.
4.4. Stage Decoupling
A serious problem exists in the FFI VCO if the weighting factor is pushed to the
maximum value of 1. In this case, each stage, n, is only using the signal from the n-2nd stage
as depicted in Fig. 4-3(b). The VCO now appears and operates as two completely
independent oscill ators. The phase difference between each consecutive stage is no longer
constant and may fluctuate wildly. This undesirable effect is called stage decoupling and
must be addressed in VCO design.
The model used to analyze this situation uses an ideal FFI VCO in which one stage
has a different delay. This modified delay represents the sum of maximum individual delay
excursions that may exist in the real VCO due to unbalanced loading effects, process
41
variations, and signal noise. The stage transfer functions are shown as
with stage a receiving the additional delay of N. The time at an output change for each stage
is represented by a letter and a subscript where the letter is the stage and the subscript is the
nth output change from that stage. The output edges appear in time order described by
The next step is to look at the time between successive outputs from any one stage,
which is simply the sum of the effective delays of the four stages. (4-9) is the same for all
stages, even though N only occurs in stage a, under the condition that stage decoupling has
not occurred. Solving for the time difference between the output of stage a and the output
of stage b using (4-4) through (4-9), yields
which are the desired solutions.
an T scn 1– 1 s–( )dn 1– N+ + +=
bn T sdn 1– 1 s–( )an+ +=
cn T san 1 s–( )bn+ +=
dn T sbn 1 s–( )cn+ +=
(4-4)
(4-5)
(4-6)
(4-7)
a0 b0 c0 d0 a1 …d1 … dn …, , , , , , , , . (4-8)
an 1+ an–4T N+( )s 1+
---------------------= (4-9)
an dn 1––T
s 1+----------- N
1 s–4
-------------+=
bn an–T
s 1+----------- sN
1 s–4
-------------–=
(4-10)
(4-11)
cn bn–T
s 1+----------- s
2N
1 s–4
-------------+=
dn cn–T
s 1+----------- s
3N
1 s–4
-------------–=
(4-12)
(4-13)
42
Figure 4-5 Delay versus weighting factor with single stage imbalanceWith non-ideal delay stages used in the FFI VCO, stage decoupling(effective delay goes to zero) can occur when the weighting factor is toohigh. This is because the VCO acts as two independent 2 stageoscillators instead of one 4 stage oscillator.
These equations are in the form of the effective delay plus a factor for the unbalanced
delay N. The delay between stages c and b; and between a and d increases rapidly as s
approaches 1, and the delay between stages d and c; and between b and a decreases rapidly
under the same condition. This divergence is expected because the sum of the four delays
follows very closely with the effective delay curve when there is no unbalanced delay. This
effect is plotted in Fig. 4-5. Also shown is the curve for all i nter-stage delays when no extra
delay is introduced. The divergence between the nominal curve and each of the unbalanced
curves can be clearly seen.
Each stage is affected by the additional delay, but when analyzing stage decoupling
it is only necessary to look at bn - an. The delay ba is the most seriously affected of all the
delays because it is relative to the output of the stage with the additional delay included.
The condition when stage decoupling occurs is when ba goes to 0 and the output of stage
b coincides with the output of stage a. Although the equations are continuous at this point,
reasonable operation dictates that stage output times should be sequential.
0
0.5
1
1.5
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Weighting Factor
Nor
mal
ized
Effe
ctiv
e D
elay
ad cb
dcba
N=0
43
Figure 4-6 Decoupling versus delay injectionWhen an unbalanced delay is injected into a single stage, decouplingbetween stages occurs when the weighting factor reaches a specificvalue.
In (4-9) with ba set equal to 0 and solving for s yields the weighting factor for stage
decoupling for a specific value of N. This solution is shown in Fig. 4-6. As the injected
delay increases the point at which stage decoupling occurs departs from the maximum
value of 1.
The effect of stage decoupling is clearly a problem and results in a VCO that operates
improperly. To avoid this problem, the weighting factor must be limited to a value less than
that given in Fig. 4-6, based upon a maximum expected delay injection from noise sources
and parameter variations. For example, if a maximum 10% deviation is expected
(extremely large value), then s must be kept below approximately 0.95. In practice this
VCO has a very large operating range which can be sacrificed to prevent stage decoupling.1
1. For the final implementation of this system s was kept below 0.8 to introduce a huge safety margin inwhich no decoupling will occur.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Normalized Additional Delay (N/T)
Sta
ge D
ecou
plin
g (s
)
44
4.5. Circuit Implementation and Analysis
Figure 4-7 Schematic for FFI VCO elementThis VCO element linearly interpolates, through the control voltage (c),the signals from the previous buffer (p) and the buffer previous to that(l). Rb limits the operating range of the VCO, Re adjusts the controlvoltage range, and Cc defines the center of the operating range.
The circuit shown in Fig. 4-7 represents one element of the FFI VCO. It is a three
input pseudo-buffer, with emitter follower outputs. The control signal, c, is common
between all stages and must be on level 3. The input l (leap) and p (previous) signals are on
level 2 which is matched to the output level. Collector resistors, Rc, are set to generate a
250 mV voltage swing. The current sources were chosen to maximize the fT of all
transistors.
Transistor sizing is a very important parameter when designing such circuits and
further details are shown in Appendix Appendix D. on page 173. Each stage in this VCO
drives two identical stages and the external circuitry, which typically consists of four
minimally sized buffers. For a VCO stage with x µm sized transistors, the external buffer
appears as a 1/x effective load, and is 1/(2x+1) the total load driven per stage. If 1 µm
transistors are used, the buffer becomes 33% of the load. If, however, 10 µm transistors are
used then the buffer becomes a nominal 4.6% of the load. So for larger VCO stages, the
external buffer becomes more invisible, but uses more power and physical space. A
c30 c31
l20l21 p20
p21
Rb
Re
Rb
Re
Cc
z21z20
Rc Rc
Is
z11
z10
45
compromise using 4 µm transistors per gate was chosen which has external loads of 11%
of the total.
Another design challenge, for maximizing frequency response, is to size the
differential ampli fier transistors independently of the emitter follower transistors. Please
see Appendix Appendix D. on page 173 for a detailed analysis. This approach was not
deemed necessary because design specifications of 5 and 10 GHz were easily met without
optimization.
4.5.1. Cascode amplifiers
Above the level 2 differential ampli fiers are cascode, or common base ampli fiers.
They provide a low input load resistance to the common emitter differential amplifier and
act as a impedance transformer. Some delay is introduced by their presence but this is offset
by an increase in driving abil ity and an isolation from the capacitor, Cc. This isolation helps
to ensure a linear relationship between the increase in Cc and the increase in delay. The
cascode ampli fiers also help to reduce phase noise by providing a low impedance output
which limits the effect noise has on the phase.
4.5.2. Emitter Resistor for linearity and gain adjustment
An ideal differential ampli fier has infinite gain, is digital in nature, and requires only
that one input is greater then the other for switching. Real bipolar ampli fiers are not ideal
and possess a high gain approaching 6 (See Appendix C.1. on page 164). High gain is
undesirable when designing PLLs because the VCO will generate more noise and loop
filters will require smaller bandwidths. Without modification, a small change in the control
voltage would cause a large change in current. The solution is to include emitter degeneracy
resistors, Re, which reduce the gain and produce a more linear transfer function. A complete
analysis of a differential ampli fier with emitter resistors is presented in Appendix C.1. on
page 164.
The value of Re was chosen based upon the desired control voltage range of ± 0.2 V,
the linearity across that range, and the frequency range. Fig. 4-8 shows the frequency
response of the VCO as a function of the emitter resistors. Values of Re below
approximately 300 Ω−µm are non-linear at the extremes and produce a gain which is
46
relatively large. Re values above 500 Ω−µm are quite linear but have a limited frequency
range, and produce a small gain. As opposed to high gain, small gain and therefore limited
frequency range, limits the PLLs in their ability to reach target frequency specifications
under all environmental and processing conditions. A trade-off exists between a high and
low resistor value and depends on the needs of the circuit.
Figure 4-8 FFI VCO frequency versus emitter resistanceBy adjusting the emitter resistor, Re, the gain of the VCO can becontrolled. A higher resistance decreases the gain.
4.5.3. Center capacitor to control frequency range center
The capacitor, Cc, between the level 1 outputs is parasitic in nature and used only to
degrade the performance of the circuit. Increasing its size causes an increase in the delay
through the gate, which corresponds to a decrease in frequency. This component is very
useful in centering the frequency range to a given specification; simulation results are
shown in Fig. 4-9. The disadvantage of using this component arises when very low
frequencies are needed, because this requires a large capacitor. Large capacitive elements
require significant amount of space, and because each of four stages needs one, their size
can become prohibitive. Fortunately for frequency centers from 2 GHz through 8 GHz the
component size is quite reasonable.
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Control Voltage (V)
Fre
quen
cy (
GH
z)
Note: resistor valuesare normalized to thesize of transistors in µµµµ m.
0 ΩΩΩΩ -µµµµm200 ΩΩΩΩ -µµµµm400 ΩΩΩΩ -µµµµm600 ΩΩΩΩ -µµµµm800 ΩΩΩΩ -µµµµm
47
Figure 4-9 FFI VCO frequency versus centering capacitorA frequency centering capacitor, Cc, is added to increase the delay ofthe stage in order to move the frequency range to within specifications.
4.5.4. Bypass resistor to prevent stage decoupling
The last and perhaps most important element to be discussed are the bypass resistors,
Rb. Their necessity, discussed in Sec. 4.4. on page40, is to prevent stage decoupling from
occurring by limit ing a full switching of current in the tree. In addition to adding decoupling
stabilit y to the VCO, these elements can also be used to limit the frequency range while
keeping the gain nearly constant. See Fig. 4-10 for the frequency response of the VCO
given different values of Rb.
The bypass resistor is tied to the collector of the control input transistors and the top
of the current source. Each node is kept at a nearly constant voltage because the bases from
the level above fix their emitter voltages. Since the voltage across the resistor is constant
the current through it will also be constant. This ensures that some current from the active
current source will always flow through both branches of the tree and thus prevent a
complete depletion of current through the branch. A smaller resistor will allow more
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
Control Voltage (V)
Fre
quen
cy lo
g( G
Hz
)
1.0
1.6
0.6
2.5
4.0
6.3
10
16
Fre
quen
cy (
GH
z)
0 fF / µµµµm25 fF / µµµµm50 fF / µµµµm100 fF / µµµµm150 fF / µµµµm250 fF / µµµµm
Note: capacitor valuesare normalized to the size of t ransi stors in µµµµ m.
48
current to flow and, in the limit , the control transistors will be completely bypassed and
both branches will receive exactly equal current. A complete analysis of this effect is
detailed in Section C.2. on page166.
Figure 4-10 FFI VCO frequency versus bypass resistanceBy adjusting the bypass resistor, Rb, the maximum current through eachbranch can be limited. This resistor prevents stage decoupling andallows frequency range control.
4.6. System Analysis
The frequency profile of the FFI VCO is a function of the various circuit parameters
including nominal stage delay, To, Rb, Re, and Cc. If Rb is removed, Re is set to 0 and Cc is
set to center at 6.0 GHz then Fig. 4-11 shows the frequency response. The range is from 3.9
GHz to 7.9 GHz, which is a one octave range. The period of the VCO is governed by (4-3)
which yields 4T when s = 0 and 8T when s=1, thus the octave range. The addition of the
other circuit components only decreases this range.
A more comprehensive look at the total system response requires an analysis of the
modified differential amplif ier and the relationship between the weighting factor s, and the
current switching between branches. Fig. 4-12 shows a diagram of the VCO frequency
profile as a function of control voltage. The three primary curve parameters are: the
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Control Voltage (V)
Fre
quen
cy (
GH
z)
1.6 kΩΩΩΩ -µµµµm2.4 kΩΩΩΩ -µµµµm4.0 kΩΩΩΩ -µµµµm8.0 kΩΩΩΩ -µµµµm
Note: resistor valuesare normalized to thesize of t ransistors in µµµµ m.
49
frequency range, the center frequency, and the gain at the center frequency. Mathematical
models describing each of these parameters can be found in the following sections.
Figure 4-11 FFI VCO Frequency RangeThis is the response when Rb is removed, Re set to 0, and Cc is set togive a 6 GHz center frequency.
Figure 4-12 FFI VCO System from control voltage to frequencyAn analysis of the FFI system should incorporate a study of the circuitresponse and the dynamics of the top-level architecture.
4.6.1. Branch current to frequency
Relating circuit parameters such as Rb and Re to the frequency profile involves a
circuit level description of the differential ampli fier. Circuit level analysis are often
expressed as differential branch current output and as such do not relate to frequency.
Relating branch current to frequency is necessary to achieve the final transfer function.
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
Control Voltage (V)
Fre
quen
cy (
GH
z)
freq
uenc
y
ran
ge
gain
control voltage
center frequency
50
From (4-3) we can find the frequency relative to the weighting factor s, which is
directly related to the current by
where T is the intrinsic stage delay, iL is the current through the branch that accepts input
from the “leaped” branch, iP is the current from the “previous” branch, and id is the
differential current. This relationship is confirmed in Fig. 4-13 where the simulated
frequency versus current are shown along with the results from (4-3) and (4-14).
Results at a weighting value of 0.5 show the largest slope difference between the
analytical model and simulation. This slope difference is important when analyzing the
frequency gain and a factor, α, is introduced to compensate. Taken directly from Fig. 4-13,
α has a value of 1.3.
Figure 4-13 Simulated versus analytical response of the FFI ArchitectureThe gray, dashed lines represent simulated frequency response forvarying branch currents, and the black continuous lines represent theanalytical expectation.
s 12--- 1
iL iP–
Io
---------------+ =
f 116T--------- 3
id
Io
----+ 1
8T------ s 1+( )= =
(4-14)
(4-15)
3.0
4.0
5.0
6.0
7.0
8.0
0.00 0.25 0.50 0.75 1.00
Weighting Factor (s)
Fre
quen
cy (
GH
z)
10
15
20
25
30
35
Effe
ctiv
e D
elay
(ps
)
Simulated
Analytical
51
4.6.2. Center frequency and intrinsic stage delay
The center frequency is directly related to the intrinsic stage delay by (4-3) when s is
set to 0.5. The intrinsic delay can be accurately modeled by the results presented in
Appendix C.3. on page171. The center frequency is modeled as
and is validated in Fig. 4-14. Intrinsic stage delay is also plotted in Fig. 4-14 because these
values are needed for the frequency gain and frequency range models. The nominal delay,
To, found through simulation, is 21 ps.
4.6.3. Frequency gain at the center frequency
Figure 4-14 Center frequency simulation and modelThe modeled and simulated intrinsic stage delay and VCO centerfrequency are shown here. The modeled results follow the simulatedresults closely.
The analytical model for current gain as a function of Rb and Re is solved in Appendix
C.1. on page 164, and Appendix C.2. on page 166. To find input voltage to output
frequency gain, two elements are needed: the voltage to current gain and the current to
frequency gain. The former was solved in (C-12) on page 169, and the latter determined by
fc3
16 To 2( ) 2RcCc( )ln+( )----------------------------------------------------------= (4-16)
0
2
4
6
8
10
12
0 50 100 150 200 250
Normalized Capacitance (fF/um)
Fre
quen
cy (
GH
z)
SimulatedModeled
0
20
40
60
80
100
120
140
160
180
0 50 100 150 200 250
Normalized Capacitance (fF/um)
Intr
insi
c S
tage
Del
ay (
ps)
SimulatedModeled
52
differentiating (4-15) and substituting the intrinsic delay equation (C-14) on page171. The
result is
which includes all circuit parameters: Rb, Re, Rc, Cc, Io, and the nominal stage delay To. α
is also included to compensate for the weighting factor and frequency gain difference
between the simulated and analytical results.
4.6.4. Frequency Range
The frequency range of the FFI VCO is mainly governed by the bypass resistor and
partially governed by the emitter resistor. Appendix C.2. on page166 describes how these
parameters limit the differential current through each branch in the VCO stage. This current
is related to the maximum frequency, fmax, through (4-15), where id is replaced with id,max,
which is found in (C-5) on page167. Taking this value, subtracting the center frequency fc,
and multiplying by two yields the frequency range, frange. Using the intrinsic delay
relationship from (C-14) on page171 and (4-15), yields
vd should be set to the maximum differential voltage that is allowed during normal
operation of the VCO.
4.7. Phase Noise
The phase noise of an oscill ator is an extremely important consideration during the
design phase. VCO phase noise and phase jitter directly affect system performance. In
serial communication circuits, a bit stream is generated with the time between transitions
defined by the jitter in the VCO and the PLL. The transport mechanism, which includes the
wire and buffering circuits, also introduce noise, which appears as phase ji tter. The larger
dfdvd
--------did
dvd
-------- dfdid
-------⋅ 12γvTRb
Rb Io 2vbe–---------------------------- Re Rb
||+
-----------------------------------------------------
α16 To 0.7 2RcCc( )+( )Io
--------------------------------------------------------- = = (4-17)
frange 2 fmax fc–( )id max,8IoT--------------
Io Re Rb+( ) vbe
vd
2-----–
–
8Io Rb 2Re+( ) To 0.7 2RcCc( )+( )-----------------------------------------------------------------------------------.= = = (4-18)
53
the jitter at the receiver, the more difficulty the PLL will have tracking the data and
consequently, data corruption will increase. It is therefore imperative to minimize jit ter at
the source to ensure maximum data throughput [15].
4.7.1. The Impulse Sensitivity Function
Noise in circuits is typically related to thermal, device: (shot and flicker), or external
effects. The relationship of the effects to phase noise can be quite complicated and difficult
to solve analytically. A straightforward method that involves an analytical foundation and
some simulation utili zes the impulse sensitivity function (ISF) [18]. It yields a closed form
solution relating circuit noise to phase noise.
Circuit noise appears as either amplitude or phase variations in the output of
oscill ators. When dealing with “digital” ring oscillators, the amplitude variations are small
because of the limiti ng nature of the circuits. Phase variations, on the other hand, are
governed by
where ∆∆∆∆q is a charge step applied to a specific node, qswing is the nominal charge swing on
that node (qswing = Cnode Vswing), and Γ(ωo,t) is the ISF.
Γ(ωo,t) can be considered as the normalized phase response of the VCO given a
current pulse at a specif ic point in the output. The ISF is large when a current pulse causes
a large change in phase and small when the ISF causes a small phase change. Fig. 4-15
shows an example of the effect on phase for two current pulses of the same size but in
different positions. The case on the left applies the pulse during the rising edge, and
effectively increases the rise time and decreases the phase. The pulse applied to the flat
portion of the curve shows lit tle or no phase change, because the circuit restores the initial
value before the edge arrives.
∆φ Γ ωo t,( ) ∆qqswing
--------------= (4-19)
54
Figure 4-15 Current pulse effect on phaseA current pulse, or charge step applied to a node in the VCO wil l have aphase effect depending on the temporal location of the pulse.
Fig. 4-16 shows the simulated ISF for the FFI VCO and the values of the output at
the time that the current pulse is applied. The response appears as it should, with an increase
during the rising edge, a decrease during the falli ng edge and a zero when the output is
constant. This form is very similar to the derivative of the waveform function. The
important values garnered from these results are the dc and rms values of the ISF. The rms
value of 0.077 is used to determine the phase noise and the non-zero dc value of 0.001
shows the upconversion of low frequency noise to base band noise.
The rms value of the ISF is only meaningful when compared against other similar
ring oscill ators. Fig. 4-17 shows various oscill ators and their associated rms values. The
single ended and differential points are CMOS rings tuned to maintain a constant frequency
that is independent of the number of stages. Their values drop with increasing N because
each stage’s transitions represent a smaller fraction of the total period and thus have smaller
effects on the ISF. The CS (Current Starving) oscil lator shows a reasonable match with the
other differential oscill ators. The FFI oscill ator, on the other hand, shows a much lower ISF
when compared to systems with the same number of stages. This has important
ramifications in the total phase noise and is discussed further in Section4.7.3.
current pulse has large phase effec t
current pulse has small phase effec t
55
Figure 4-16 Simulated ISF for FFI VCO and output waveformThe FFI VCO ISF is shown here along with the waveform at the pointthat the pulse is applied.
Figure 4-17 ISF rms values for various ring oscillatorsShown in this plot are the rms values for the FFI, CS (Current Starving),CMOS differential (DE), and CMOS single ended (SE) ring oscillators.
-0.50
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0 1 2 3 4 5 6
Normalized Time (rad/T)
ISF
-1.3
-1.25
-1.2
-1.15
-1.1
-1.05
-1
-0.95
-0.9
-0.85
Wav
efor
m V
olta
ge (
V)
ISF
Waveform
ISF
Waveform
Number of Stages (N)
rms
valu
e of
ISF
3 4 10
0.1
0.2
1.0
FFI
CS
SE
DE
56
4.7.2. Solving for phase noise
Using the superposition integral, the phase response for any injected noise current i(t)
is equal to
The single-sideband phase-noise spectrum due to a white-noise current source is given by
[18]
where Γrms is the rms value of the ISF, is the single-sideband power spectral density
of the noise current source, and ωoff is the offset from the carrier.
Noise in the FFI circuit element shown in Fig. 4-7 is generated primarily by HBT shot
noise and resistor thermal noise. The nodes of interest, those generating the most noise and
the most sensitive to current fluctuations, are the level one outputs, z10, and z11. The level
2 outputs do introduce twice the shot noise but are less susceptible to current induced phase
variations because of their low output resistance and strong restoring force.
The single-sideband power spectral density (PSD) for the resistor noise and the
collector shot noise is
where G is the conductance of the pull -up resistors, and Ic is the current though the collector
which is half the tail current. Further refinement of (4-21) and (4-22), and substitution of
values for temperature, resistance, and current for optimal operation, yields
where N is the number of stages, l is the length of transistors in µm, and ∆φrms is the rms
phase deviation with a simulated charge injection of ∆q.
φ t( )Γ ωo τ,( )
qswing
---------------------i τ( ) τd
∞–
t
∫= . (4-20)
L ωoff Γrms
2
4ωoff2
-------------in2 ∆f⁄
qswing2
---------------⋅= (4-21)
in2 ∆f⁄
in2
∆f----- 4kTG 2qeIc+= (4-22)
L ωoff N( )l2ωoff
2-------------
∆φrms2
∆q2----------------- 161 10 24–× A2
Hz-------⋅ ⋅= (4-23)
57
Using (4-23) at a frequency offset of 1 MHz, the FFI VCO has a phase noise value of
-93.0 dBc/Hz and the CS VCO has a phase noise value of -79.1 dBc/Hz. If cascode
ampli fiers are added to the CS VCO to achieve a more accurate comparison, the phase noise
decreases to -85.1 dBc/Hz. Both VCOs have the about same center frequency1 and both
consume the same amount of power.
4.7.3. Phase noise comparison between the FFI and CS VCOs
The benefit achieved by using the FFI architecture for VCO design, rather than a
standard ring VCO, is at least 8 dBc/Hz of noise reduction. This improvement is quite
compelli ng because it comes without the need for additional power.
There are two main factors which contribute to the noise reduction. The FFI VCO has
a higher frequency because of the incorporation of a novel architecture. This higher
frequency can be traded off for an increase in level one capacitance. Capacitance was added
to each stage to weaken its speed and bring it in line with the speed of a standard ring
oscill ator. Additional capacitance helps to absorb current noise by decreasing the
bandwidth on the outputs. It essentially softens the voltage spike caused by an insertion of
charge at the output node. The CS VCO, for example, has a level one capacitance of 28 fF
and the FFI VCO has a capacitance of about 180 fF.
The second effect is a result of the averaging that occurs between the two inputs to
each gate. Any noise disturbance on one input is offset by averaging and results in a change
of 66% from the unaveraged expected result. At fi rst it would appear that the effect should
only be a 50% but because of the propagation of the effect through multiple averages, the
progression leads to a 66% change. This factor of two thirds corresponds to a 2.2 dBc/Hz
decrease in the overall phase noise.
1. The center frequency of the CS VCO is actually about 70% that of the FFI VCO. If properly matched thenoise value gap between the two wil l only widen because of the larger capacitor required by the FFI.
58
4.8. Jitter
Jitter in a ring VCO is generated by four primary noise sources within each variable
delay element: thermal noise from the collector resistors, tail current noise, sampling of
input noise by switching of differential pairs, and noise at the VCO input [17], [18]. κ is
used as a time domain figure of merit relating the standard deviation of a transition over a
fixed amount of time
Each noise source contributes to the total κ as described in detail in [19]. This equation is
valid for all time in the open loop case and valid for time less then the loop time constant
in the closed loop PLL, case.
In this VCO, the noise generating sources in the delay element are frequency
independent due to the nature of the frequency control. Thermal noise from the collector
resistors remains constant because the capacitance and resistance remain constant. Noise
introduced by the degenerate tail current source also remains fixed. The input differential
pair noise is dependent on the amount of current through the pair, which is linearly
switched between the inputs. Since the total current remains constant, the total noise
contribution from each pair wil l remain approximately constant. For these reasons, the
ji tter introduced by one stage remains constant over all frequencies.
Although noise induced jit ter per stage remains the same, the total jitter per transition
depends strongly on the transition interpolating abilit y of the VCO. When the VCO is
operating in the four stage mode, the jitter in one period is a result of the jitter from all
four stages. However, as the weighting factor is shifted to favor the feed-forward signal,
the jit ter introduced during a full period is only from two stage elements rather than four.
κσt
∆T-----------.= (4-24)
59
The result after including (4-2) is that κ varies according to
The factor of ω/ωo is added to normalize in terms of transitions independent of the
frequency.
Using (4-3) and solving for s as a function of the frequency fraction gives
and substituting (4-26) in (4-25) yields
where κο is the nominal jit ter constant for an identical ring oscillator without feed-forward
interpolation, ωo is the center frequency and ∆T is the time over which the open loop jitter
is being measured. This equation is graphed in Fig. 4-26.
Using the derivation in [19] and the data in Table 4-1 yields a κο of 18 . Through
calculation and simulation it was found that the largest contributor to overall j itter was
from the input differential pairs and the emitter followers.
κσt
1 s+( ) ∆T ωωo
------
---------------------------------------.= (4-25)
s3ω2ωo
--------- 1–≈ (4-26)
κ 23---
ωo
ω------κo≈
(4-27)
n s
60
4.9. Interconnect Parasitic Simulations
Interconnect parasitics are increasing in importance in the design of high speed
circuits. In slower, larger circuit the capacitance and resistance of the interconnect was
dwarfed by device parameters. Now, with very small devices, this is no longer true and
interconnect parameters are as large, or larger than device parameters. Also, with an
increase in operating frequencies, speed of light propagation time becomes a larger fraction
of the overall cycle time.
In general, the effect of non-ideal interconnect is an increase in delay through the
wires. This is crucial for ring oscill ators, since the operation of the circuit requires stringent
control over the delay. If properly simulated and accounted for, an underperforming VCO
can be avoided. An oscillator that achieves significantly higher “ ideal” speeds then
specified is required. It is not uncommon for interconnect to decay speeds by as much as
10% to 20%.
To ensure operation at 5 GHz, the FFI VCO was designed with a 20% safety margin.
To do this, the circuit was designed to run at 6 GHz without interconnect effects included.
This safety margin, in addition to the already large frequency range, assures proper
Table 4-1 Circuit parameters for calculating ji tter.
Parameter Value
Re 100 Ω
Rc 100 Ω
Iee 3.2 mA
Ko 5.5 GHz/V
en(vco) 4.6 nV /
Rbase4 inputs
4 followers
152 Ω x 8
Hz
61
operation at 5 GHz. Only with a 20% interconnect effect and a 20% decay from other
negative effects will the VCO fail to meet the specifications.
Fig. 4-18 shows the effect on the frequency response before and after adding
interconnect capacitance.1 The performance drops a uniform a 12%. Larger effects were
seen in the Current Starving VCO because of smaller transistor size and the resulting larger
percentage of interconnect to total capacitance.
Figure 4-18 FFI with capacitive interconnect parasiticsThe introduction of interconnect parasitics reduces the performance ofhigh speed circuits. When designing a ring oscillator it is absolutelynecessary to include these effects.
4.10.HDL Model
A transistor level model of this ring oscill ator includes 60 active devices and 12
devices for the required balancing loads. If a frequency divider is needed, such as the 1/8
in the transmitter frequency synthesizer, 54 additional devices are needed to represent the
1. The IBM 1999B SiGe design kit does not account for interconnect resistances correctly and typicallyshows a faster response than with capacitance only. Resistance values are also very small for these localizedwires. For these reasons, only capacitance was included.
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Control Voltage (V)
Fre
quen
cy (
GH
z)
No Parasitics
Capacitive Parasitics
62
entire VCO. The processing and time limitations imposed on simulating 126 devices is
prohibitive and limits design iteration.
The solution to this problem was to create an analog Hardware Description
Language, HDL, model of the VCO [40]. Spectre HDL, a Cadence package, was used
because it is tightly integrated into Cadence and is very similar to VerilogA which is the
leading analog HDL. The code for the VCO, shown in Appendix Appendix E.1. on
page 179, was modeled after the simulation data in Fig. 4-18. Input loading effects were
included in the model so that no addition circuit needed to be added. The output was also
inaccurately modeled as a sine wave and was buffered by a small buffer to transform the
signal into something more representative of a real signal.
The time associated with simulating the transmitter PLL was reduced by about 60%
with very littl e effect on accuracy. The extra time allowed more frequent design iterations,
since each highly accurate simulation usually takes hours to run. Another benefit of the
HDL model is the ability to extract parameter values such as instantaneous frequency and
phase, which was extremely helpful in analyzing the PLL. With a transistor level
simulation these values are hidden.
4.11.Final Implementation
The final implementation of the FFI VCO was used in revision two (Serdes II) of the
transmitter and receiver and in the FFI VCO test chip. The specifications were based on the
goal of a 20 Gb/s communication system, and the architecture for that system.
4.11.1. Circuit Parameters
Both the transmitter (4-1 multiplexer core) and receiver (twice oversampling core)
required a quarter-frequency clock thereby forcing a VCO with a 5 GHz center frequency.
To remain conservative and ensure that the 5 GHz specification will be reached 4 µm
transistors were used, and a centering capacitor, Cc, of 19 fF/µm or 76 fF was chosen. (see
Fig. 4-14 on page 51) Under ideal simulation conditions this put the center frequency at 6.2
GHz, and when parasitics are included, at 5.4 GHz.
The specification for frequency range was partially dictated by the uncertainty of
achieving the 5 GHz center. Process variations, interconnect parasitics, model inaccuracies,
63
and other simulation difficulties necessitated a large range to ensure that any center
frequency deviation could still achieve 5 GHz. In addition, because the bypass resistors
intended to control stage decoupling also affect the frequency range, their effect must be
considered. The decision was made to maximize the frequency range (see Fig. 4-10 on
page 48) while having a conservative response to the stage decoupling problem. The value
of Rb was chosen to be 6.4 kΩ-µm, yielding a VCO possessing a large range, and a strong
decoupling prevention.
The gain of the VCO was chosen based upon the input control voltage swing and the
need to provide a linear response across all control values. Since a reasonable voltage swing
for CML circuits is 250 mV, as noted in Appendix C, a range corresponding to this swing
was chosen for the VCO. This yielded a value of Re equal to 400 Ω−µm.
In addition to the 5 GHz VCO a high speed 10 GHz VCO was also designed for test
within the Serdes II chip. It had no centering capacitor so that a maximum frequency could
be achieved. The ultimate goal was to see if this faster 10 GHz VCO could be used to design
a 40 Gb/s communication system.
4.11.2. Layout Considerations
A poor layout can result in an underperforming circuit, consequently, layout
preparation is an extremely important design concern. Proper layout of a ring oscill ator
minimizes noise, and interconnect parasitic effects. In addition, because these oscill ators
generate considerable “digital” noise it is crucial to isolate them from nearby analog
circuits.
The first goal in the FFI layout, see Fig. 4-19, was to minimize the number of inter-
stage wires and make them symmetrical to guarantee uniform phase spacing. The solution
was to design a single compact stage and position the four of the stages around a center with
input and outputs in the middle. This provided perfect symmetry and minimal interconnect
but required four unique orientations of the devices. Differing orientation introduces
directional process variations into the design, but symmetry appeared to be the more
important factor.
Substrate coupling1 and power supply noise, although partially offset by the
differential nature of the circuit topology, is important to address. Substrate noise can occur
64
from external as well as internal circuits. Minimizing external substrate noise, and internal
switching effects on external circuits involved the design of a deep trench moat with a
substrate contact ring along the inside, as shown in Fig. 4-20. This act provided a ground
return path for the enclosed circuitry to the substrate contacts and minimized coupling
outside the ring due to the large path around the deep trench. This is critical for this VCO
because of its high frequency, multi-phase digital signals that are often near low-noise
analog loop filters in PLLs. The compact design also forces substrate noise to appear as one
common mode source, thus minimizing its influence.
Figure 4-19 FFI LayoutShown here is the final layout of the FFI VCO. Outputs can be takenfrom the center or the edges of the block.
1. Substrate noise in this SiGe technology is of particular importance because of the substrate’s lightlydoped nature.
deeptrenchmoat
substratering (grounded)
powergroundrails
centeringcapacitor
225
µ µµµm
171
µ µµµm
65
Figure 4-20 Reducing substrate couplingBy using a deep trench moat and substrate contacts, substrate couplingcan be minimized.
Minimizing the length of the supply-lines to pads provides a low resistance ground
return path. Like substrate noise suppression, a compact design forces supplies to appear as
one common mode source. When laying out routes to external circuits where phase
uniformity was important the signals were taken from the center of the VCO to ensure
constant length wires. In addition, dummy buffers were included when a VCO phase output
was not needed to maintain consistent loading.
4.12.Experimental Results
A test chip implementing a 5 GHz (Cc = 76 pF) and a 10 GHz (Cc = 0 pF) FFI VCO
was designed along side the Serdes 2 chip. It placed the two VCOs in an environment that
is identical to that found in the transmitter and receiver. Two input pads with capacitor
bypass provided a differential input for each VCO. The remaining four high-frequency
pads were dedicated to a buffered and a 1/8 divided output of each VCO.
The slower VCO was used in the Serdes 2 transmitter and receiver and had a center
frequency target of 5 GHz. The higher speed VCO was designed to be used in the Serdes 3
project with a center frequency at 10 GHz.
sil icon surface
deep trench DT
inte
rnal
ci
rcu
itry
exte
rnal
ci
rcu
itry short ground
return path
substrate contact
66
Figure 4-21 FFI waveform at 5 GHzThis waveform was captured with a control voltage set to generate a 5GHz output. The peak-to-peak swing is approximately 300 mV.
4.12.1. Frequency Response
The shape of the measured frequency response in Fig. 4-22 is nearly identical to the
simulated response. It is smooth, linear around zero, and monotonically increasing. The
differences are found in the frequency range and center. The center frequency at 0 mV
control voltage, was expected to be 5.33 GHz but was measured 8% lower at 4.72 GHz.
The frequency range dropped 17% from 2.72 GHz to 2.27 GHz. In addition, the gain at
center decreased from 5.57 GHz/V to 4.98 GHz/V.
The measured offset between simulation and test results is li kely due a capacitance
on the level 1 nodes of the ring stages that was larger than anticipated. Base capacitance
modeling has always been a difficult issue, as capacitance can have a considerable effect
on the frequency. A capacitance increase of 50 fF yields a frequency change that would
match the frequency decrease.
Another possibilit y is the poor modeling of fT which has a very dramatic effect on
frequency. Part of the effect can be seen in Fig. 4-24, where the supply voltage, was
increased beyond the nominal voltage. This increased the current, and to a point increased
67
the frequency. Although the CML trees were optimally designed for maximum f T, clearly
more collector current results in a better response.
Figure 4-22 FFI VCO measured resultsThis plot shows results simulated with interconnect parasitics, andmeasured results for the FFI VCO. The target of 5 GHz for the slowerVCO was achieved at a control voltage of 60 mV rather than theexpected -50 mV.
4.12.2. Common Mode Gain (5 GHz VCO)
The common mode gain represents the gain associated with a common mode change
in the input while the differential voltage is kept the same. As the common mode voltage is
decreased, the level 3 differential pair begins to press into the active current source below
it. Although the current should remain constant as the source’s collector moves and the
Early effect produces a slight slope in the response. (see Fig. A-3 on page158) This has the
effect of decreasing the current as the collector to emitter voltage is decreased. At some
point the source transistor begins to saturate and the collector current drops more rapidly.
With higher common mode voltages the level three transistors are pulled from the
active sources which cause the same current effect discussed above. Although the level 3
transistors are pressing into the level 2 transistors, there is little effect because the active
3
4
5
6
7
8
9
10
-400 -300 -200 -100 0 100 200 300 400
Control Voltage (mV)
Fre
quen
cy (
GH
z)
simulated(parasitics)
measured
simulated(parasitics)
measured
Cc = 0 pF
Cc = 76 pF
68
source is maintaining a constant current. With a gain of 5 GHz/V from Fig. 4-22, and a
common mode gain of 0.5 MHz/mV, the common mode rejection ratio, CMRR, is 20 dB.
Figure 4-23 FFI common mode responseThe common mode response of the FFI is quite flat with only a 1%deviation in frequency when the common mode is swept through ±100mV.
4.12.3. Response versus supply voltage (5 GHz VCO)
The frequency of the VCO continues to increase, with decreasing supply voltages
down to -4.3 V. This can be attributed to an increasing transistor fT as the collector current
increases. Below that voltage the transistors begin to experience high current effects and
the fT drops. At the peak frequency supply voltage of -4.3 V the collector current is
approximately 1.1 mA, which is higher than the 0.8 mA expected for fastest operation. The
power supply gain at the nominal -3.3 power supply is -600 kHz/mV.
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
-400 -300 -200 -100 0 100 200 300 400
Common Mode Control Voltage (mV)
Com
mon
Mod
e G
ain
(MH
z/m
V)
4.20
4.25
4.30
4.35
4.40
4.45
4.50
4.55
4.60
Fre
quen
cy (
GH
z)
Frequency
Common Mode Gain
69
Figure 4-24 FFI response versus supply voltageAt the nominal supply voltage of -3.4 V the center frequency is 4.6GHz. Lower voltages show a quick decrease in frequency, while highervoltages show an increase in frequency until -4.5 V. Above -4.5 V thefrequency drops quickly.
4.12.4. Phase noise measurements
Phase noise measurements, shown in Fig. 4-25, are very close to the ISF predictions
in Section 4.7.2. on page 56. At a 1 MHz offset from the carrier, the phase noise was
measured at -90 dBc/Hz and was calculated to be -93 dBc/Hz. The difference can best be
attributed to: testing effects, probe and wiring losses, and higher temperatures than
anticipated.
Because of the high noise testing environment a special differential input filter was
buil t to suppress signal noise on the differential input. The fil ter consisted of a differential
RC filter, with a very low bandwidth, and a non-electrolyte capacitor. In addition, because
supply noise was an important contributor to noise, batteries were used to supply power to
the chip.
3
3.2
3.4
3.6
3.8
4
4.2
4.4
4.6
4.8
5
-6-5.5-5-4.5-4-3.5-3-2.5
Supp ly Voltage (V)
Cen
ter
Fre
quen
cy (
GH
z)
70
Figure 4-25 Open loop phase noise of FFI VCOThis plot shows the phase noise versus the carrier offset frequency. Thedata was collected using a LabView program in conjunction with aspectrum analyzer and special software supplied with the equipment.
4.12.5. Jitter measurements
The jitter relationship versus frequency plot is shown in Fig. 4-26. The data was
collected with an open loop VCO circuit using a HP 11801C sampling oscilloscope with
∆T set to 50 ns. The model described by (4-27) accurately described the end points of the
ji tter function but the results were off by as much as 20% in between. This can be attributed
to the fact that when the VCO operates more like a four stage oscillator it exhibits fast rise
times. During interpolation, however, the VCO favors a sine-wave output and the rise time
is reduced, increasing the jitter. As s is increased, and a two stage oscillator is approached,
the rise time is more representative of that indicated in the model. At the target operating
frequency of 5 GHz, κ is equal to 14.2, which is 36% lower than κ when operating as a
normal four stage oscill ator.
-130
-120
-110
-100
-90
-80
-70
-60
100 1000 10000 100000
Frequency (kHz)
Pha
se N
oise
(dB
c/H
z)
71
Figure 4-26 FFI VCO analytical and measured jitterThis plot shows how jitter is related to the frequency of oscillation. Thefact that the jit ter improves at higher frequencies is a result of the systemoperating with fewer stages.
10
12
14
16
18
20
22
3.0 3.5 4.0 4.5 5.0 5.5 6.0
Frequency (GHz)
(s)
analytical
measured
72
5Design of the Transmitter
5.1. Project History
The first transmitter was submitted to IBM for fabrication in February 1999 as a
stand-alone chip. It generated all 16 parallel data bits internally and had no mechanism to
accept externally supplied data. The bit rate specification of 20 Gb/s operating speed was
not achieved due to a VCO load imbalance.
The second prototype, submitted to Sierra Monolithics Inc. in April 2000, was a
unified transmitter-receiver chip. It contained improvements made to the first prototype
and was designed to be a fully working chip capable of being packaged or wafer tested. The
transmitter is this implementation easily hit the 20 Gb/s target data frequency.
An invention disclosure record for the symmetric multiplexer was submitted in
February, 2000. RPI has subsequently stated that they are going to pursue a U.S. patent for
this invention.
5.2. Top Level Architecture Overview
The goal of the transmitter is to accept low speed parallel data and multiplex it to high
speed serial data. In some cases, it must first encode the data by adding extra bits for error
correction, byte alignment, word framing, or channel synchronizing. The encoded data is
then multiplexed from n parallel bits to a single bit stream. An additional stage, driven by
a very low noise PLL, may then be used to retime the data [42] to remove accumulated
noise. Finally, an amplifier is used to drive the external channel that carries the signal.
This Serdes project did not investigate data encoding due to limited time and
resources. Although a full featured chip may include data encoding, a system of this type
can still operate without one. Presumably the role of the encoder would be off-loaded to the
next level of hardware or software.
Transmi tter
73
A 16-to-1 multiplexer was implemented as four 4-stage registers and one 4-1
multiplexer. The design revolved around a unique multiplexing scheme that required four
inputs and could run with a quarter frequency clock. The output data was clocked at 20
GHz, but the oscillator ran at 5 GHz. Since 16 external bits were to be supplied to the chip
and the multiplexing scheme required four bits, a front-end register that could be expanded
to meet a parallel data word of any width was designed.
Instead of adding an additional stage to perform symbol retiming, the retiming
function was pushed into the multiplexer. This necessitated a complete redesign of the
standard multiplexing CML gate, so that it could handle the stringent timing requirements
for transmission. The symmetric multiplexer evolved from this redesign process.
Like the retiming circuit, the channel ampli fier was also incorporated into the
multiplexer. This involved ramping up transistor sizes and making a change in the output
stage of the multiplexer.
Figure 5-1 Transmitter and multiplexer architectureThe top level transmitter design consists of a 16-1 multiplexer driven bya 5 GHz PLL. Four 4-stage shift registers capture 16 bits of data every800 ps. These then feed the 4-1 multiplexer in order to serialize the data.
5.3. 16-1 Multiplexer
Fig. 5-1 depicts the core of the transmitter, the multiplexer. It
is divided up into a 4 x 4 shift register bank and a 4-to-1 multiplexer,
also shown in the same figure. The 4-to-1 multiplexer captures 16
bits of data every 800 ps and serializes them to a stream of bits. The
width of each bit at 20 Gb/s is 50 ps.
116
VCO PLL
Transmitter
1.25
Gb
/s
20 G
b/s
16-1Mux
16-1 multiplexer
4-1
mu
ltip
lexe
r
shift reg
4
4
4
4
B
A
C
D
Transmitter
74
The shift registers consist of four cascaded MS-latches, each with a 2-to-1
multiplexer front-end. By selecting different inputs, the array of four latches can either load
external data, or accept data from the previous latch. Clocking the select line assures that
after 3 bits are shifted through the next “shift” , will result in a load. Each load pulse is
separated by 16 times the bit width or 800 ps. The tail bit of the register shifts in a zero
because new data overwrites it before it never makes it out of the head latch.
Figure 5-2 Data timing for the 4-1 multiplexerThe multiplexer interleaves the incoming data by using a multi-phase,quarter frequency clock. Timing of this circuit is critical because thiscircuit also has the responsibili ty to retime the data.
The unique nature of the multiplexer requires data in registers A and D to be offset
by 100 ps from data in registers B and C. This offset was accomplished by clocking the
registers with two in-quadrature phases of the PLL.
Each of the four registers is connected to the 4-to-1 multiplexer as an input. A special
“shuff ling” clocking scheme is used to multiplex the data. This alleviates the need for a 10
GHz clock that would typically be required to convert the final two 10 Gb/s signals into one
A
B
C
D
0o
90o
A
B
C
D
BA
CD
CBAD
0o
90o
a0
a0
a0
a1
a1
a1
a2
a2
a2
b0
b0
b0
b1
b1
b1
b2
b2
b2
b3
b3
c0 c1 c2
c1
c1
c0
c0 c2
c2
c3
c3
c3
d0
c3d0
d0
d1
d1
d1
d2
d2
d2
0ps 200ps 400ps
0
1
75
20 Gb/s signal. One single-frequency clock can control the shift registers and clock the
multiplexers.
Multiplexing is accomplished by offsetting registers A and D by 90° from registers
B and C (see Fig. 5-2). This creates the basic interleaving data sequences, BA, and CD,
which are synchronized with the first stage of 2-to-1 multiplexers. Interleaving was not
necessary to create the sequences, but without it, coincident edges and timing gli tches could
have been introduced.
Signals BA, and CD arrive at the final multiplexer in phase with each other. The
phase of the select signal of this multiplexer is shifted exactly 90° from the previous
multiplexer’s select signal. This effectively cuts both BA, and CD in half and combines
them to form a CBAD signal. Therefore, final output edges are created from two sources:
the final multiplexer select and the change of inputs during selection.
The phase difference between the 90° and 0° signal is criti cal in determining any
output transition offsets. Any mismatch between the phases directly correlates to a phase
offset between consecutive transitions in the bit stream. To guarantee a 90° phase
difference a delay which exactly matches the delay of the two 2-to-1 multiplexers is
introduced. The easiest way to do this involves using a matched multiplexer whose a input
is set to 0 and b input is set to 1. Although this technique consumes some power its use is
necessary to significantly reduce phase mismatch.
5.3.1. The Case for the Symmetric Multiplexer
The 2-to-1 multiplexer is the final non-ampli fying stage in most serial transmitter
circuits. It is, therefore of utmost importance to study and understand the performance of
this gate and how its performance affects the data stream.
A typical 2-to-1 CML multiplexer util izing levels 1 and 2 is shown in Fig. 5-3. Data
inputs a, and b are on level 1 and the select input, s, is on level 2. In a clocked circuit the
important performance parameter is the delay from the input transition to the output
transition. The largest delay is taken from all of the possible combination of inputs and
outputs. This parameter, in conjunction with other gate delays, ultimately determines the
maximum speed at which the circuit can be clocked.
76
The multiplexer performance metric, however, is very different when used in a
transmitter when the multiplexers perform the retiming. Delay through the gate is of
secondary importance, whereas the shape and aperture of the eye diagram is of criti cal
importance. Bit widths must remain consistent, and bit amplitudes must remain large
enough to be received when noise is present.
Figure 5-3 CML Two Level MultiplexerThe level difference between the inputs a, and b; and the select input s,produce a phase mismatch when a, b, and s, are ali gned by 90°°°°. . . .
The data and select signals arriving at the multiplexer are forced to a phase difference
of 90° by the VCO and overall circuit architecture. It is questionable whether an exact 90°
difference is appropriate for this gate because the inputs arrive on different levels. Is there
any inherent difference between their respective delays? Perhaps a better choice of phase
exists such that a more uniform output is generated? How does the difference in levels
affect the loading and driving from previous gates?
The circuit in Fig. 5-4 was designed and simulated in order to analyze and answer
these questions. Signals a and b are complements of each other and the select signal’s
phase, ∅∅∅∅, , , , is varied around 90o. Ideally, the average value of the output will coincide with
the median when ∅∅∅∅ is equal to zero. This condition corresponds to an output with a 50%
duty cycle, in which each bit is of equal width.
The results of the analysis are shown in Fig. 5-5, and indicate that a phase offset of
13.5°, 7.5 ps is needed to maintain a 50% duty cycle. This effect is a result of the data
existing on level 1 and the select lines being on level 2. For a select change to propagate to
the output it must travel through two levels of logic where a data change only needs to travel
a1a0
b0 b1
s0
z1
z0
Q1 Q2 Q3 Q4
Q5 Q6 s1
77
through one. There is also a loading difference between the two logic levels. The collectors
on level 1 see the pull -up resistors and the base of the proceeding gate. On level two the
collectors see two emitters from the level above.
Figure 5-4 Simulation Testing of CML 2:1 MultiplexerBy varying the select phase relative to the data phase and averaging theoutput signal over time, a measurement showing ideal select and dataphase offsets can be made.
A 50% duty cycle when the phase difference between data and select signals is 90°
is desired, since both are driven off the VCO. The multiplexer, however, requires a 103.5°
phase difference for symmetric output. A delay element could be introduced to the data
lines to add 7.5 ps, but a better solution was invented; the symmetric multiplexer.
The symmetric multiplexer accepts all inputs on the same level, has the same loading
per input, and ensures that any input (data or select) will propagate to the output in the same
amount of time. An implementation of the gate is shown in Fig. 5-6. The left hand side of
the multiplexer represents the OR condition a ·s + b ·s, which generates the high output,
and the right hand side represents the inverse condition (a + s) · (b + s), which generates the
low output. The four transistors, Q1-Q4, in the center, act as a shared differential ampli fier.
During all static conditions one branch will have a high and a low level transistor and the
other branch will have both transistors in an intermediate state. The branch with the high
level will carry all of the current and produce the z output.
0°
180°
90° + ∅
2:1MUX
load
average
0 ps 200 ps 400 ps
∅∅∅∅
a
b
s
a
b
s
z
78
Figure 5-5 Simulation Results for CML 2:1 MultiplexerThe crossing point, or 50% duty cycle point, occurs at 13.5°,7.5 ps. Thisshows an asymmetry between the select and data inputs.
Figure 5-6 CML Single Level Symmetric MultiplexerA novel implementation of a multiplexer with inputs all on level 1,identical loading per input, and completely symmetric response.
-1.3
-1.25
-1.2
-1.15
-1.1
-1.05
-1
-0.95
-0.9
-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
Phase (degrees)
Ave
rage
Out
put V
olta
ge (
V)
Q1
Q2
Q3
Q4
I ½I ½I½I½I½I ½I
a0
b0
s0
z1z0
Input Stage Output Stage Input Stage
a1
b1
s1
79
Fig. 5-7 shows the state of each transistor based upon the input values. “H” represents
a high state, or the highest voltage and indicates which transistor will carry the current. The
Medium level falls halfway between the High and Low levels. To ensure proper noise
margins the voltage difference between the high and low levels is increased to 500 mV.
This places a 250 mV difference between the two top voltage levels.
Each of the transistors in the central tree of the multiplexer is driven by two
differential pairs. This allows for a reduction in the size of the 12 input transistors without
any loss of signal integrity, and also directly compensates for the doubled loading on each
input. A drawback is that each input requires a minimum of 2 µm of load, no matter the
output driving abilit y.
Power requirements for this circuit are also four times higher than those for a typical
level 1 output CML multiplexer. On the other hand, since this circuit only requires one level
of logic, the negative power supply can be reduced by at least 25%.
Figure 5-7 Symmetric multiplexer transistor statesThe states of transistors Q1-Q4 are defined to be high, low, and middle.The transistor in the high state carries the current and dictates the outputvalue.
5.3.2. Final Implementation and Simulation
Serdes I did not utili ze the symmetric multiplexer and had a 15% phase error in
alternating edges, shown in the simulation in Fig. 5-8. Figure (a) shows the eye diagram of
the standard CML multiplexer. The inputs were designed to exercise the circuit as much as
possible, i.e. using 50 ps input pulses, and differing a and b inputs when the select input
a b s Q1 Q2 Q3 Q4 Z
0 0 0 M M L H 0
0 0 1 M M H L 0
0 1 0 L H M M 1
0 1 1 M M H L 0
1 0 0 M M L H 0
1 0 1 H L M M 1
1 1 0 L H M M 1
1 1 1 H L M M 1
80
changes. At the center voltage of 125 mV, two distinct crossings can be seen, which result
from the input to output delay imbalance in the CML circuit. The time for a select transition
to reach the output is about 10 ps longer than for an a or b input to reach the output.
Figure (b) shows a much cleaner eye diagram for the symmetric multiplexer. The
reason for this improvement lies in the circuit architecture, which was designed with
symmetry to ensure that any input changes propagate to the output in the same amount of
time. The ramifications of this are obvious. The transmitter output will benefit from a clean,
low phase noise multiplexer signal.
The 4-to-1 multiplexer with symmetric architecture in Serdes II also plays the role of
the line driver by driving the pads directly. The reasoning behind this design feature was
removing the noise that would be introduced by an additional li ne driver. By integrating the
two components, the total phase noise is smaller. In order to accomplish this, larger 12 µm
transistors, capable of sinking 9.6 mA, were used in the final multiplexer. In addition, a
cascode ampli fier was added to the output stage to limit the loading on the differential pair.
Driving the final 12 µm output stage required ramping up of transistor sizes so that
the input stage of the final multiplexer was not loaded down. Starting with a 1 µm input
stage, two intermediate emitter followers were added of sizes 2 µm and 4 µm. This enabled
an output stage with 8 µm transistors, each capable of driving transistors of their own size
or larger. This output stage drives the final multiplexer which has an input of 4 µm. Once
again, two 6 µm and 8 µm emitter followers were added, followed by the 12 µm output
stage. This technique required a total current of 63 mA as compared to a 15.4 mA current
requirement for the standard CML multiplexer and the associated pad driver.
81
Figure 5-8 Multiplexer Eye DiagramsThese plots are output eye diagrams for the standard CML multiplexer(a), and the symmetric multiplexer (b). Both circuits received identical20 Gb/s inputs and identical loading.
Figure 5-9 Multiplexer Layout for Serdes I and IIThe transmitter 16-1 multiplexer consists of a 4x4 shift register and a 4-1 multiplexer. The layouts for Serdes I (a) and Serdes II (b) are shownhere.
-0.30
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0 20 40 60 80 100
Time (ps)
Out
put V
olta
ge (
V)
-0.30
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0 20 40 60 80 100
Time (ps)
Out
put V
olta
ge (
V)
(a) (b)
(a) (b)CML multiplexer 3 symmetric
multiplexers
4x4 registers 4x4 registers
82
5.4. Phased Locked Loop (Frequency Synthesizer)
When reducing phase noise in the transmitter becomes the most
important design factor, the transmitter phase locked loop, PLL,
becomes the most important circuit in the system. Its role is to
generate a high frequency, extremely low noise clock from a low
frequency, noisy, externally supplied reference clock. For the transmitter PLL in this
design, the external reference is at 625 MHz, and the PLL clock output is at 5 GHz.
The standard linear model of a PLL, shown in Fig. 5-10, has a phase detector (PD),
a loop filter (LF), and a VCO. The phase detector subtracts the phase of the input signal
from the phase of the output signal. This gives a measure of the phase offset of the two
signals and is the mechanism that allows the phases to be locked together. The loop filter
filters the output of the phase detector in order to meet certain feedback characteristics,
such as output noise, pull -in range1, and pull -in time2. The VCO acts as an integrator,
converting a control signal to an oscill ating signal represented as a phase. Finally, a 1/8
frequency divider is used to match the internal frequency to the external input frequency,
as required by the PD.
Figure 5-10 Linear model of PLLThe PLL used in the transmitter consists of three primary parts: phasedetector, loop filter, and VCO. An input filter is added to reduce thenoise levels of the input signal.
The transmitter’s frequency synthesizer went through three major revisions during its
evolution. These revisions are depicted in Fig. 5-11. During the rapid development of the
1. Pull -in range is the maximum range of frequencies for which the PLL can eventually acquire lock. ThisPLL parameter is primarily a function of the PD implementation, but is also determined by the frequencyrange of the VCO.
2. Pull-in time or acquisition time is the amount of time it takes the PLL to achieve lock from an initial fre-quency deviation that is within the pull-in range.
Transmit ter
phasedetector
Kd F(s) Ko/s
VCOloopfilter
θθθθi
θθθθo
frequencydivider
to transmitter
Y(s)
inputfil ter
v i
83
first transmitter prototype, a PLL was designed that had minimal functionality and poor
performance. The goal was to quickly develop a clock multiplier without concern for phase
noise and jitter performance.
With more time and results from Serdes I, a highly improved Serdes II PLL evolved.
It possessed a 3 state PD, which improved the lock-in range1 and acquisition time; an active
op-amp style LF, further improving key characteristics; and the FFI VCO which reduced
noise and increased performance was still mi ssing from this design. An optimized
bandwidth driven by previous results and specifications. Measuring data about the noise
characteristics of the VCO and gathering information about the noise spectrum on the input
noise source was key to bandwidth optimization.
Test data from the first two prototypes, better simulation techniques, and further
research yielded the final PLL design. VCO noise spectra allowed for a much better
bandwidth design, further minimizing PLL output phase noise. A smaller bandwidth
required frequency detection in the PD because of the much longer pull -in time. Another
improvement replaced the clumsy op-amp integrator with a high performance specialized
integrator which is also used in the receiver PLL.
1. The lock-in range, a function of the PD and the PLL bandwidth, is defined as the maximum frequencydeviation for which the PD wil l remain in lock, where the PD is in its linear range and does not slip.
84
Figure 5-11 Frequency synthesizer evolutionThe transmitter’s frequency synthesizer went through three majorevolutionary steps. The first had the most basic components andprovided minimal functionality. The second incorporated bettercomponents to minimize noise and improve the acquisition range andtime. The third, unfabricated version, added advanced PLL componentsand optimized key design variables based upon simulations andmeasurements from the other prototypes.
5.4.1. Input Filter
An effective technique in reducing PLL phase noise is to drive it with a very clean
reference source1. The PLL has the abilit y to lock a noisy VCO to a clean reference and
reduce the total output noise to a level below that of the VCO. With this in mind, an input
bandpass fil ter was designed and implemented in order to reduce the out-of-band noise of
1. The signal source used in the Frisc testing lab is very old and very noisy. In practice, a very well con-trolled low phase noise signal generator would be used as a reference and an input filter would not beneeded.
XOR PD type I passive LF(RC low pass fil ter)
CS SimpleVCO
Serdes I
3-statePD
type II active LF(op-amp filter)
FFI VCO
Serdes II
3-state PDwith frequency detector
type II active LFspeciali zed integrator optimized bandwidth
FFI VCO
Serdes III
input fil ter
85
the signal source. This technique was added to the Serdes II design but removed in the
subsequent design because a better input signal generator was acquired.
Figure 5-12 Schematic for input filterThe input filter is a bandpass filter centered around the referencefrequency. It is intended to filter output low and high frequency noiseassociated with this signal.
Fig. 5-12 depicts the schematic of the input filter, which consists of an input
attenuator and an active bandpass filter. The active component of the filter is simply a high-
gain two-stage buffer with level one and level two outputs. The first stage does not effect
the voltage gain of the amplif ier and has Darlington pair inputs to reduce the input current
by a factor of β. Twenty-five percent larger pull-up resistors were used to increase the total
gain to approximately 5. The input resistor tree attenuator compensates for the large total
gain of the bandpass filter by reducing the input amplitude by 78%.
The frequency transfer function for the input filter is shown in Fig. 5-13. The peak
was designed to be at precisely 625 MHz with a bandwidth large enough to account for
parameter mismatches and frequency adjusting.
Because the final effect of this filter on the output phase noise of the PLL was not
known, a multiplexer was added after this circuit so that it could be bypassed if necessary.
This opens up the ability to determine the filter’ s actual usefulness.
5.4.2. Phase Detector
A phase detector produces a signal that yields information about the difference
between the phases of its two inputs. Ideally it produces a perfectly linear response for all
bandpassfilter
attenuator
R1
R1
R2
R3
R3C1
C1C2
C2
CML amplifier
R1R2R3C1C2
800 ΩΩΩΩ224 ΩΩΩΩ2 kΩΩΩΩ500 fF500 fF
86
phase differences and has an arbitrary gain. For real circuits, however, we must settle for
non-linear responses that may have regions where the gain becomes negative, where the
function is periodic in π/2 or π rather than 2π, and where the gain varies across the range.
5.4.2.1. Phase detector (Serdes I)
Figure 5-13 Input filter frequency responseAt the reference frequency of 625 MHz the input filter achieves aslightly greater then unity gain. All other frequency are attenuated.
Two different phase detectors where investigated in Serdes I and Serdes II, the XOR,
or Gilbert Multiplier, and the 3-state, respectively. The schematics for the XOR PD, shown
in Fig. 5-14, consist of a single tree CML gate with emitter followers. At one extreme, the
inputs are in phase and the average value of the output is 0. When the inputs are 180o apart,
the other extreme, then the output is 1. For the 3-state detector the output is taken
differentially across its two internal signals VU, and VD. These signals’ rising edges, which
are outputs from the two resetable MS-latches, coincide with the rising edges of the input
signals, Vi, and Vo. The falli ng edges, on the other hand, are triggered together after both
have risen. This creates a wider pulse on the signal, VU, or VD, when the associated input
arrives first.
-60
-50
-40
-30
-20
-10
0
1 10 100 1000 10000
Frequency (MHz)
Gai
n (d
B)
87
The output of the XOR PD, shown in Fig. 5-15, has a linear response from -180o to
180o. Outside that range the output slope is negative and produces a temporarily unstable
PLL response before the phase detector output enters a positively sloped region again. The
gain is about 0.53 V/rad which is relatively high. It is set by the large input control range
of the VCO used in Serdes I, the Simple Current Starving version of the VCO.
Figure 5-14 Phase detector schematicsThe XOR detector (a) uses a XOR logic cell to perform phase detection.The 3-state detector (b) util izes two resetable MS latches and an andgate.
5.4.2.2. Phase detector (Serdes II)
Fig. 5-15 also shows the output of the 3-state PD. Its response is greatly improved
over that of the XOR PD. First, the slope is always positive and it extends across the entire
input phase difference range. This greatly improves the response of the PLL during lock
acquisition. This response wil l be discussed in Section 5.4.6. Another important
improvement appears when phase error is continuously increased above 180o, which is
common with larger frequency offsets. Although the plot shows that the output is -120 mV
above 180o, the output will step to 0 mV, and continue to rise beyond that phase. This effect
increases the pull -in range.
In order to implement the 3-state PD one significant hurdle related to the reset
feedback through the AND gate had to be resolved. Proper operation occurs when the
second output edge from the latches causes the AND to go high, reset both latches and bring
the AND low again. Through simulation, however, the very thin reset pulse was faili ng to
reset one of the latches. The problem was traced to the non-uniform loading of the output
latches and the asymmetry in the AND gate inputs. The solution was to use a single-ended
v i
vo
R
DQ
1
R
DQ
1
v i
vo
vd vd
vU
vD
(a) (b)
XOR Phase Detector 3-State Phase Detector
88
AND gate to provide symmetric loading, and matched input levels for both latches. This
ensured that both latches were uniformly reset, and alleviated all timing issues.
Figure 5-15 Simulated phase detector responsesPlotted above is the average of the signal output of the two phasedetectors. The XOR phase detector has a valid range between 0o and180o, and the 3 state detector output is valid for any phase difference.
These PDs are used in a frequency synthesizer which includes a divide-by-8
component. The nature of the PLL gain K, and the 3 dB bandwidth is such that they are
both reduced by a factor of N. This factor is incorporated into the PD gain which gives the
XOR PD an adjusted gain of 66.3 mV/rad and the 3-State PD an adjusted gain of 5.25
mV/rad.
The lock-in range of the PLL using the XOR PD is (π/2)K and πK for the 3-state PD.
The larger range of the 3-state PD provides higher resistance to cycle slips and yields a
shorter pull -in time when used with a frequency detector. The pull-in time of the XOR PD
is about four times larger then the 3-state PD with the same PLL bandwidth. The pull-in
range is also four times larger for the 3-state PD. The simulated figure of merit1, M, for the
1. The figure of merit, M, for a PD is Vdo/Kd, where Vdo is the mean value of the PD output and K d is gain.A low M value for a PD yields a small pull-in range.
-150
-100
-50
0
50
100
150
-270 -180 -90 0 90 180 270
Phase Difference (degrees)
3 S
tate
Pha
se D
etec
tor
Out
put (
mV
)
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
XO
R P
hase
Det
ecto
r O
utpu
t (V
)
3 state
XOR
89
XOR gate is quite high, approaching 1 million. This was expected, because of the very
simple nature of the XOR gate. The 3-state PD, on the other hand, has a value of about 22
which is appropriate for a circuit of this complexity.
5.4.2.3. Phase detector (Serdes III)
Research into Serdes III necessitated a decreased bandwidth in order to further
suppress spurious noise introduced by the PD. Side effects of a decrease are a reduction in
the pull-in range, and an increase in the pull-in time. A very effective way to counter these
negative effects is to add a frequency detector, FD, to the 3-state PD. This circuit is able to
detect cycle slips and provide a strong pull-in signal in response. A cycle slip occurs when
the phase error exceeds the bounds of the PD (0, 2π) and the output steps (See Fig. 5-15 on
page 88). This is indicative of a large frequency error and if the proper circuitry is added to
sense this event then a large change can be made to the loop fil ter integrator.
Figure 5-16 PLL frequency detectorA frequency detector detects cycle slips from the PD and performs largecontrol voltage changes. This allows a much wider pull-in range, andsmaller pull -in times.
vo
vU
vD
v i
slipdetector
slipdetector
vU’
vD’
3-state PD
loop fil ter
RQ
D
delay
D Q
X
Y
Y
X
X
Y vs
1
slip detector
vo
v i
vd
vd
cyc le slip
90
The schematic in Fig. 5-16 shows the implemented frequency detector that was added
to Serdes II’s design. The detector compares the input to the output of PD. When a cycle
slip occurs, an output edge normally created on vu by v i’s rising edge is missing, and this is
sensed by the slip detector. The detector will then add or remove a fixed amount of charge
from the charge pump integrator. This causes a step change on the output of the integrator.
The key to implementing the FD is to ensure that the induced frequency step, ∆ωc,
does not exceed twice the lock-in range, ωL which would force the frequency to oscill ate
around ωL and never acquiring lock. Typically ∆ωc is conservatively set to ωL so that pull -
in time is minimized and PLL lock is ensured.
5.4.3. The VCO
Serdes I utili zed the Simple CS VCO with a gain of approximately 0.5 GHz/V. Its
highly variable gain, and non-linear frequency response made analytical modeling of the
PLL difficult. The second and third prototypes used the FFI VCO which has a consistent
gain of 6 GHz/V. Its linear response made analytical modeling much easier to perform.
5.4.4. Loop Filter
The loop filter in a PLL plays a criti cal role in determining the PLL bandwidth.
Usually the gains of the PD and the VCO, are fixed and therefore the loop filter is the only
component available to control the bandwidth. A high bandwidth corresponds to a strong
abilit y to track the input phase at high frequencies. This would be very useful for a receiver
that needs to track an incoming signal plagued with transmitter and line noise. This abilit y
will be discussed further in the following chapter. A small PLL bandwidth, on the other
hand, ignores phase variations on the input and performs very slow tracking. This is the
necessary situation for a transmitter since it needs to generate a very clean VCO signal,
independent of the noise introduced by the input reference signal and from the VCO.
Reducing the bandwidth too much, however, prevents the PLL from tracking out the VCO
phase noise. An optimum bandwidth for minimum total output phase noise does exist and
should be determined.
91
5.4.4.1. Serdes I Loop Filter
The transmitter PLL in the first prototype utili zed a passive low pass filter1. The filter
is a two stage RC ladder, and has two poles, but for the purpose of analysis, the higher
frequency pole can be ignored, since it only helps to reduce spurious modulation2. The loop
type is considered a two pole loop: one pole in the loop filter and one pole in the VCO. The
poles are at 30 MHz (ωn) and 207 MHz, when the capacitance and resistance values are 2
pF and 1 kΩ, respectively. The decision was made to use two RC stages rather than one to
increase the high frequency signal rejection.
Figure 5-18 Tx PLL passive loop filterA second order low pass filter utilizing a two stage RC ladderconfiguration.
The resistor and capacitor component values were maximized, for low bandwidth as
discussed above, based primarily on the proper operation of the PLL and on layout
limitations (capacitors consume large amounts of area). Since the PD output is differential
in nature, symmetric loading requires a duplication of the RC ladder. the four capacitors
were therefore limited to about 2 pf because they take up a large amount of layout space.
Resistor sizes, on the other hand, were reasonably small but values larger than 1 kΩ
introduced considerable loading effects because this RC circuit had to drive the VCO aVref
control circuit.
1. The design time constraint for this critical Serdes I component was very limited, and effort was only putinto the PLL’s proper operation rather then optimization. In the end it worked well enough to drive the trans-mitter and allow collection of all desired data.
2. A common problem in frequency synthesizers is called spurious modulation and is a result of the nor-mally much higher frequency output of the PD. A result of the frequency divider, these lower frequency sig-nals are not adequately attenuated by the loop filter and are passed on the VCO as unwanted phase noise.
F(s)
log f
|F(jωωωω)| (dB)
ωωωωn
1111
C=2 pFR=1 kΩΩΩΩ
R R
CC
92
5.4.4.2. Serdes II Loop Filter
Further research and design allowed for a much improved loop filter to be used in
Serdes II . The first important enhancement was the move to an active rather than passive
filter. The use of an integrator allowed a loop filter dc gain, F(0), approaching infinity to
be used in contrast to a passive filter’s dc gain of unity. From this, the PD static phase error,
becomes approximately zero, when the PD offset voltage1, Vdo, is zero, where Kd is the gain
of the PD, and where Vco is the static control voltage2 of the VCO. Under these conditions
the input phase difference is kept near zero, when the PLL is in lock, which improves the
purity of the synthesized frequency [41] and aids acquisition.
Figure 5-19 Tx PLL active loop filterThis active loop fil ter incorporates a low pass front-end followed by anintegrator. The op-amp has a FET input stage to minimize loading, ahigh gain NPN stage and a low impedance output stage.
Resistors, R1, and R2, and capacitor, C, and the amplif ier in Fig. 5-19 form the core
of the filter. These elements form a integrator with a zero at
1. Vdo is the free running, or offset phase detector voltage. It represents the DC output voltage offset for thePD and is a property of the PD alone.
2. The static control voltage or V co, is the control voltage applied to the VCO which matches the input andoutput frequencies. It is related to the input signal and VCO properties.
θeo
Vdo–
Kd
-----------Vco
KdF 0( )------------------+= (5-1)
Gain StageNPN differential amplifier
Output Stagelow output impenitence
FET Front-Endhigh input impedance
low passfilter
integrator
op-amp
C
C
R2
R2C3
C3
R1/2 R1/2
ω21
R1C----------= (5-2)
93
and a gain of
at frequencies above ω2. This choice of 6.4 MHz for the loop bandwidth was based loosely
on comparisons with other similar loops which have bandwidths of approximately 1 MHz
[41]. These similar loops, however, utili ze a much cleaner LC VCO, so a larger bandwidth
was needed to compensate.
The final design of the loop filter yielded values for R1, R2, and C, equal to 16.7 kΩ,
6.67 kΩ, and 14.1 pF respectively. ω2 was 1.7 MHz, Kh was 0.4, and the total loop gain and
bandwidth was 6.4 MHz. In addition, the low frequency gain which is governed by the gain
of the amplifier is about 5.
Figure 5-20 Active loop filter transfer functionThe active loop has a 1.7 MHz zero which forces a high DC gain. A poleat 21 MHz attenuates high frequencies to reduce spurious modulation.
The addition of a low pass filter, or pole, to minimize spurious modulation, is realized
through element C3 in Fig. 5-19, with a cut-off frequency at ω3. The frequency of the pole
is at 21 MHz and yields a capacitor value of 1.8 pF.
Kh
R2R1-------= (5-3)
-100
-80
-60
-40
-20
0
20
Frequency (Hz)
Gai
n (d
B)
1kHz 1MHz 1GHz
ωωωω 2ωωωω 3
94
The frequency response of the open loop response is plotted in Fig. 5-20. A zero at
ω2 produces a -20 dB/dec slope which is not realized at low frequencies due to the non-
infinite gain of 13.5 dB of the op-amp. Above ω2, the gain is Kh until the pole at ω3 where
the curve drops off at -20 dB/dec. An additional pole at approximately 100 MHz exists
within the op-amp for loop stabilit y.
5.4.4.3. Serdes III Loop Filter
The implementation of the Serdes II I loop filter utili zes a negative impedance
ampli fier, NIA, charge pump [27]. Fig. 5-21 shows that the circuit has a RC fil ter which is
balanced or floated between a pull -up resistance and pull -down negative resistance. As
long as the sum of these resistances equates to zero then the filter nodes are allowed to float.
Any deviation from zero will result in a drift in the differential output voltage to infinity,
or to zero. To ensure a reasonable initial condition, the pull -up resistors should be slightly
smaller then the NIA resistance so that the differential voltage is slowly pulled toward zero.
The negative resistance is generated through a linearized CML feedback tree that is
very similar to the storage mechanism in a MS-latch. The current through one branch is
where Io is the total current through the tree, R is the value of the pull -up resistors, and v1
and v2 are the outputs and the nodes of the capacitor. Technically, the circuit acts as a
negative impedance
which is based upon a differential voltage and current. The end result is that the differential
voltage, v1-v0, is allowed to float at any value less than RIo. The resistance value of the NIA,
Rn, is the sum of the linearizing resistors and the emitter resistance, as described in
Appendix C.1.
ia
Io2----
v0 v1–
R----------------–= (5-4)
v0 v1–
i0 i1–------------------- Rn–= (5-5)
95
Figure 5-21 Receiver III integratorThe integrator used in Serdes III consists of a negative impedanceampli fier which essentially “ floats” a capacitor and current trees tomove charge on and off each end.
The striking benefit of this negative impedance charge pump is that it allows charge
to be removed from either end of the capacitor while the differential center voltage is
maintained. Removal of capacitor charge through a CML tree causes a differential voltage
change, and when a constant current is drawn, the voltage will ramp accordingly, thus
showing the integration.
There are two methods for affecting the differential output voltage; each method is
handled by its own circuit. The first is a standard current source which uses a linearized
CML tree with inputs int0, and int1 to draw current from either side of the filter. The
ampli fier gain, Ka, is approximately 1 mA/V. This value can be derived from the linearized
CML tree plot found in Fig. C-3 on page 165. The constant includes a factor of 1/2 because
the current is split between two paths, one directly through the pull -up resistor and one
through the filter.
The second method is a step input used in conjunction with the frequency detector in
the PD. In the case of a 3-state PD, a cycle slip detected by the FD will pulse one step input
i1
negative impedance amplifier
7x
int0 int1step0 step1ref
z0z1
Io
Rp
v1 v0
i0
RC1
C2
96
or the other and cause a large charge change on the capacitor. The size of the step current
source dictates the amount of change.
Serdes II I was the first design with a loop gain that was optimized for minimal output
phase noise based on measured and simulated phase spectra data from the FFI VCO
discussed in Section 4.12.4. on page69. With this information and phase noise data on the
reference source, the noise spectrum plot shown in Fig. 5-22 can be created. It shows the
voltage spectral density for the FFI VCO and for a very low noise reference source. The
frequency at the point of intersection indicates the ideal value for loop bandwidth. Values
lower than this allow more VCO noise to propagate to the output while values higher than
this allows more reference noise to propagate to the output and increases the spurious
modulation from the reference.
Figure 5-22 Voltage spectral density for optimal loop bandwidthShown above is the voltage spectral density of the VCO and thereference source. The point where they intersect is to first order theoptimal place to define the loop bandwidth.
The reference source to be used is quoted as having a noise spectral density of -140
dB at frequencies below 1 GHz. This must then be subtracted by the PLL multiplication
factor of 8, or the equivalent of 18 dBm. The VCO voltage spectral density was found
-140
-80
-100
-60
-40
VCO
reference source
1 MHz
100 kHz
10 kHz
1 kHz
10 MHz
100 MHz
ΦΦΦΦ (dBc/Hz)
optimum loop BW for minimum noise
-120
effective reference
18 dBm
20 dB/dec
97
through simulation, analytical and measurement results, and has a value of -90.2 dBc/Hz at
1 MHz.
The relatively high noise content of the VCO and the low noise content of the
reference source placed the optimal loop bandwidth, K, at 33 MHz. Suppressing spurious
modulation requires placing a pole at 4K, 132 MHz, far enough above K so that the PLL
response will not be affected. At a reference frequency of 625 MHz, this results in an a 13
dB suppression of spurious noise which by
is equivalent to data rms jit ter of 5 ps. The PD minimum duty cycle, δ, is approximately
0.03. σt is one tenth of a bit width, which is unacceptable. Clearly the suppression of
spurious modulation is criti cal in minimizing jitter. Instead of a loop bandwidth of 33 MHz,
a bandwidth of 6 MHz was used instead. This yields an rms jitter due to spurious
modulation of 0.14 ps, which is considerably lower.
With a K at 6 MHz, the PLL zero (ω2) is placed at K/4, or 954 KHz, to give a 13%
response overshoot, and the pole (ω3) at 4K, or 24 MHz. For a VCO gain Ko of 34.5
Grad/s/V, a PD gain, Kd, of 5.25 mV/rad, a loop filter gain, Kt, of 1 mA/V, the high
frequency gain Kh must be set to 208 for K = KoKdKtKh = 2π(6 MHz). Solving for the loop
components from
yields C1 = 802 pF, C2 = 53 pF, and R = 208 Ω.
σt 50ps( )π4--- πδNK2
fr2
------ = (5-6)
F s( ) Kh
s ω2+
s 1s
ω3
------+
------------------------=
Kh
C1R
C1 C2+-------------------=
ω21
C1R----------=
ω3
C1C2R
C1 C2+-------------------
=
(5-7)
(5-8)
(5-9)
(5-10)
98
The size of the stepping transistors can be found using
where C is the capacitor size, ωL is the lock in range (πK = 18.8 MHz), K d is the PD gain
(34.5 Grad/s/V), and fc is the reference frequency (625 MHz). For this implementation the
calculated current is 3.4 mA, corresponding to a transistor size of 4 µm. The ref input is
used in conjunction with the step inputs and allows them to be driven single ended to save
power.
5.4.5. PLL Loop Response
The value of the PLL gain, K, is directly related to the 3dB point, and its design is
based on two factors: the VCO noise response and the input noise level. Small values of K
yield strong input noise immunity, as the PLL is very slow to respond to input deviations,
but transmits all of the low frequency VCO noise to the output. A small bandwidth is also
effective at reducing spurious modulation. A large value of K, on the other hand, allows the
PLL to track the input very closely and attenuate a considerable portion of the low
frequency VCO noise, but means that any input noise is passed on to the output. K, as a
frequency, also has a direct proportional effect on the pull -in range, and an inverse
relationship with the pull -in time. Put simply, a larger K allows the PLL to lock in more
quickly over a larger frequency range.
The process of choosing K is affected by the output noise specifications for the PLL,
but no noise specifications were given for the design of this PLL, as it was meant for short-
haul communications, where noise does not play a crucial role. So instead, K was chosen
small enough to limit the effects of the input noise, but not to adversely effect the layout
with large component sizes. Ensuring proper operation was also important, so design limi ts
were not pushed and instead a “center road” approach was taken.
The step response for the passive loop of Serdes I and the active loop of Serdes II is
shown in Fig. 5-23. Both responses show a very clean, non-oscill atory response which
represents adequate choices for pole locations. Serdes II has a longer settling time due to
I2CωL
Ko
--------------fc≤ (5-11)
99
the larger bandwidth and does not undershoot. From [41] the damping factor, ζ, is
calculated to be 0.47, and 0.65 for the PLL in Serdes I and Serdes II , respectively.
Figure 5-23 PLL simulated step responsesThe above plots, simulated in MATLAB, show the step responses forboth PLLs in Serdes I and II . The longer settli ng time of PLL 2corresponds to the smaller bandwidth. PLL 3 has nearly the sameresponse as PLL 2.
PLL phase noise in this case is realized as output phase noise of the transmitter. For
this reason, no direct PLL phase noise can be measured. Section 5.10. details the noise
results for the two transmitter designs. No simulation of phase noise in the PLL was done
for this particular design.
5.4.6. Lock Acquisition
Lock acquisition can be described by two factors: the pull -in time, Tp, and the pull -
in range, ωp. The pull-in time represents the maximum amount of time the PLL takes to
acquire lock and track the input phase when started out of lock. The pull -in range is the
largest frequency error for which the PLL wil l acquire lock. Both items are important
metrics in describing the usefulness of the PLL, and ideally Tp will be zero, and ωp will
cover the entire frequency range of the VCO.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100 120 140 160 180 200Time(ns)
PLL
Ste
p O
utpu
t (ra
d)
Serdes I
Serdes II
Step Inpu t
Serdes I
Serdes II
100
Figure 5-24 PLL I simulated acquisition plotsThe above plots show the PLL in Serdes I during simulated acquisitionwhich is ideal and not equivalent to real li fe. This is also known as thejellyfish plot.
5.4.6.1. Serdes I Simulated Acquisition
Since Serdes I used a passive loop filter, the pull -in range is restricted by and equal
to the frequency of the dominant pole ω3 at 30.3 MHz. This is a result of the -π/2 angle shift
introduced by the pole, which effectively nulls the pull -in voltage. If, for example, a -π
angle shift was introduced then the PD output would be inverted, push-out would occur,
and the PLL would move further away from lock. The pull-in time is a complicated
parameter to derive; an expression and its derivation is presented on pages 186-187 of [41].
A rough approximation for pull -in time from simulation is 100 ns.
5.4.6.2. Serdes II Simulated Acquisition
Serdes II’s PLL simulated response is shown in Fig. 5-25. The pull -in time is about
four times that of Serdes I due to the smaller loop bandwidth and different phase detector
characteristics. With similar loop bandwidths and similar loop filters, the pull -in time for a
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
0 20 40 60 80 100 120 140 160
Time (ns)
Con
trol
Vol
tage
(V
)
660 MHz
670 MHz
680 MHz
690 MHz
700 MHz
710 MHz
720 MHz
730 MHz
101
PLL with a 3-state PD versus an XOR PD is about 4 times smaller, and the pull- in range is
about 4 times larger. This is primarily due to the negative slope that exists in the XOR
response but not in the 3-state response, as shown in Fig .5-15 on pa ge88.
Figure 5-25 PLL II simulated acquisition plotsThe above plots shows PLL II during simulated acquisition which isfairly representative of actual acquisition, however Spice has anadvantage in setting initial conditions which can show a better responsethan in real l ife. Here is the squid plot.
The simulated pull- in time for the Serdes II implementation is about 400 ns, and the
pull -in range is approximately 75% of the full range of the VCO (600 to 900 MHz). The
addition of the 3-state PD has greatly enhanced the pull- in range at the expense of pull-in
time. This is a very favorable trade-off since typical pull -in time specifications are on the
order of µ-seconds.
5.4.6.3. Serdes III Simulated Acquisition
The third prototype has characteristics very similar to the second prototype, including
similar parameters such as: loop bandwidth, pole and zero locations, phase detectors,
VCOs, and gains. Acquisition plots are, therefore, nearly identical to those shown in Fig.
5-25. See Section 5.4.6.2. for pull -in times, and pull -in ranges. The FLL used in this
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0.25
0 100 200 300 400 500
Time (ns)
Loop
Filt
er O
utpu
t (V
)
600 MHz
650 MHz
700 MHz
750 MHz
800 MHz
850 MHz 900 MHz
102
implementation does not have a considerable effect, but it does reduce the pull-in time by
about 10%.
5.4.7. 20 / 40 Gb/s Implementation
One area that was pursued in the development of the second prototype was an ability
to run the transmitter at either 20 or 40 Gb/s. Adding a second higher speed VCO,
multiplexers on the outputs, and an additional multiplexed divide-by-two circuit was rather
straightforward, as shown in Fig. 5-26. The primary difficulty arose when designing the
loop bandwidth to be appropriate for both VCOs. In the 5 GHz mode, the detector gain is
Kd/8 and in the 10 GHz mode it is Kd/16. This requires a reduction in half of the loop pole
frequency so that stable operation is guaranteed for both situations. This reduction has
negative implications on the pull -in time, because pull-in time has a inverse relationship to
the pole frequency. Halving the frequency doubles of the pull-in time.
Figure 5-26 5/10 GHz PLL implementationCreating a 5 and 10 GHz PLL involved the addition of a 10 GHz VCOand various multiplexers to select the correct phases and the properdivision circuit.
5.5. Clock Distribution
Clock distribution in the transmitter involves delivering the
PLL signal outputs, to the shift registers, to the external circuitry for
data loading, and to the multiplexers, with maximum phase
alignment. All prototype transmitters utili zed the same scheme for
clocking.
4 phases
3-state PDloop fil ter
5 GHzVCO
10 GHzVCO
divide-by-2 divide-by-8
625 MHz reference
T ransmitter
103
A chain of buffers delays, whose inputs are the PLL 0o and PLL 90o signals from the
PLL, constitutes the majority of the clock distribution system (see Fig. 5-27). It ensures that
data and clock travel in the same direction and that delays in the shift registers, buffers, and
multiplexers are matched to delays in the delay chain.
The most critical path in the clock distribution circuitry is found between the PLL and
the 4-to-1 multiplexer. Here the PLL 0o and the PLL 90o signals must stay phase matched
to ensure alignment of bit edges on the output. Offsets in these signals directly translate to
phase jitter and more diff icult signal reception. To ensure alignment, the delay chain was
designed to be symmetrically loaded, of minimal length, and perfectly balanced. Because
the 4-to-1 multiplexer was designed as a two stage multiplexer, and because of the critical
timing required by its architecture, a precise delay of one multiplexer was added to the 90o
line, guaranteeing perfect clock alignment at the multiplexers. Consequently the SEL 0o
and SEL 90o signals are offset by exactly one multiplexer gate delay.
The next most important timing event is the clocking of the four shift registers. The
90o branch of the delay chain and its inversion handles all four registers. Since loading from
the 8 latches (4 MS latches) was a concern, a driver buffer was added to the front of each
register. This forced the addition of an equivalent delay into the delay chain. The total
number of gate delays difference between the CLK AD input and the SEL 0o signal was
designed to be zero, to ensure maximum noise margin. The timing diagram, Fig. 5-28,
clearly depicts the precise relationship between the signals.
Loading the 16 bits of parallel data requires a clock edge every 800 ps (50 ps x 16
bits), a time four times slower than the PLL period, thus necessitating a load counter,
depicted in Fig. 5-29, which is essentially a frequency divider. Not only does the load
counter have to divide by four, it also has to create two load signals separated by 100 ps
because of the clock offset on registers A and D versus B and C. The load signals select the
multiplexer input on each bit to its load mode rather than shift mode. When the next rising
clock edge arrives data is latched into the register.
The final aspect of clock distribution is the generation of the signal that informs the
external circuitry that it is ready for new parallel data. The straight forward solution is to
use the LOAD AD signal. This guarantees that when both loads have completed, the data
has had a maximum amount of time to settle.
104
Although the use of a delay chain makes clock distribution straightforward and very
reliable, it does have one serious drawback. Since it lies between the PLL and the output
multiplexer, it contributes to the overall phase noise and jitter of the circuit. This noise is a
result of shot noise, thermal noise in the chain of buffers, fabrication mismatches between
the 0o and 90o phase lines, and coupling between the lines and substrate. Minimizing these
noise effects involved designing a symmetric and tight layout of the delay chain.
Figure 5-27 Clocking scheme for transmitterThe top level schematic for the transmitter clocking circuitry includesthe PLL as the clock generator, a delay chain for distribution, theregisters, and the 4-1 multiplexer.
load counter
delay chain
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
DQ
S
4
4
4
4
PLL
90o
exte
rnal
ly s
up
plie
d p
aral
lel d
ata
LOAD ADCLK AD
SE
L 0
o
SE
L 9
0o
A
B
C
D
BA
CD
0o
SO
LOAD CLK
105
Figure 5-28 Transmitter clock timingThe timing of the transmitter revolves around the delay chain whichensures that the data and the clock flow in the same direction. Thebottom three signals clearly show how the 4-1 multiplexer interleaves toproduce the output.
Figure 5-29 Load counterThe load counter divides the PLL signal by four and generates two 200ps load pulses offset by 100 ps from each other.
PLL 0o
PLL 90o
LOAD AD
CLK AD
SEL 0o
A,D
B,C
pulse every 4th CLK 0o edge
BA, CD
3 gates
3 gates
SEL 90o
SO
0 200 400 600 800 time (ns)
QD
QD
QD
QDLOAD CLK
LOAD BC
LOAD AD
LOAD CLK
LOAD BC
LOAD AD
800 ps 200 ps100 ps
106
5.6. Data Encoding
Data encoding is a general term for such techniques as:
encryption, compression, improved transition density, error
detection, channel alignment, byte alignment, DC voltage
balance, simpli fied clock recovery, and frame detection.
Typically, improved transition density and channel alignment are
performed on-chip although all could potentially be performed
off-chip. No encoding was performed in either Serdes I or Serdes II. See Section 5.11.1. on
page 118, for a brief study and recommendation of the 8B/10B encoding scheme.
5.7. Line Driver
The purpose of the line driver is to ampli fy the transmitter
signal, and drive the 50 Ω output line. Depending on the
specifications, this can either be a single-ended or differential
circuit [48], [36], [37]. At these speeds differential is usually the
optimum choice. The bandwidth of the circuit must be large
enough so that is will not attenuate the high frequency
components and close the signal eye. Noise is also an issue since
any phase noise introduced by the line driver will be directly realized on the output.
The line driver in the Serdes I circuit utili zed a simple pad driver circuit which was
not optimized for this purpose. In Serdes II , however, the line driver was integrated into the
final output multiplexer which limited the introduction of noise. The output voltage swing
was designed to be 400 mV.
5.8. Internal Testing Circuitry
5.8.1. Serdes I
Serdes I was designed without the abilit y to accept external
parallel data. Instead, the data was generated pseudo-randomly on
chip, through a 16 bit linear feedback shift register (LFSR).
Transmi tter
Transmi tter
T ransmitter
107
Designing a true maximal length 16 bit LFSR would create a sequence 65,535 bits
long, and because 16 bits are transmitted then followed by a single shift and repeated, the
serialized length is greater then 1 milli on bits. This was determined to be too long for the
simple reason that it would be very diff icult to determine whether the transmitter was
working correctly, during testing. An oscill oscope can only capture so much information
and it would be nearly impossible to find the exact position within the sequence.
Instead, a four bit maximal length LFSR followed by a 12 bit shift register was
implemented. The circuit shown in Fig. 5-30, has 16 MS-latches clocked through a buffer
tree, an XNOR gate for feedback, and an AND gate to create a synchronizing signal. The
synchronizing signal, SYNC senses all zeros in the LFSR and was placed on an output pad
in order to detect the start of the sequence. The ZBIT is the final bit of the generator and
was also placed on a pad to analyze the operation of the circuit. A 4 input AND gate, not
shown in the figure, determines if the LFSR contains all ones and if so inverts the output of
the XNOR to force proper oscill ation.
Figure 5-30 Serdes I LFSRA 16 bit, on-chip pseudo-random pattern generator consists of a 4 bitLFSR and a 12 bit shift register. The circuit used in the transmitter iscapable of generating a 240 bit serial stream.
5.8.2. Serdes II
Off-chip testing of this serial communication system required testing equipment that
operates at the bandwidth of the transmitter and receiver. At the rates being designed for no
such equipment exists and comprehensive testing must be done on-chip. The testing
scheme that was implemented feeds the transmitter serial output directly to the receiver and
the parallel data received back into the transmitter as shown in Fig. 5-31 [43]. A single bit
offset between the receiver outputs and the transmitter inputs allows data input on Tx pin
0 to travel through the loop 16 times, and then output on pin 15 of the Rx. By generating a
SYNC
0 1 2 3 4 5 15ZBIT
4 bit LFSR4 bit LFSR 12 bit shift register
CLOCK
000011101100101010000111011001010100001110110010101000011101100101010000111011000010100001110110100101000011101111001010000111010110010100001110101100101000011111011001010000111110110010100001011101100101000000111011001010000001110110010100
108
pseudo random sequence (see Fig. 5-30) at the input and verifying that sequence at the
output, the bit error rate (BER) can be measured. The verifying circuit generates a pulse
every time a good sequence is measured. A missing pulse indicates a bit error. A divider
was added at the output so that high BER measurements could be made without high
bandwidth test equipment.
With a 12 bit maximal length LFSR, a 4095 bit sequence can be generated. Since the
total sequence must traverse the loop 16 times, a minimum BER of 10-5 can be detected
with this method. The maximum time is determined by the time length of the test.
Figure 5-31 True error rate detectorThe TERD operates by feeding the transmitter output back into thereceiver and feeding the deserialized data back into the transmitter. Aone bit offset with an LFSR and verifier determines the BER.
The TERD requires proper channel alignment, which is accomplished through data
encoding and decoding. Since these circuits were not included in the second prototype, the
bit pattern generator was configured to feed directly into the transmitter through the pin
mapping shown in the top of Fig. 5-32 Various bits had to be duplicated, but after inversion
and separation the data is stil l suff iciently random.
bit pattern generator
bit pattern verification
TxRx 012345
8679
101112131415
012345
8
67
9101112131415
LFSR
resetgood pattern
Rx
bit
15
bit pattern verificationtransceiver
109
Figure 5-32 Serdes II bit pattern generatorA 12 stage LFSR with feedback to three stages yields a maximal lengthLFSR. A reset line was needed for use in the bit pattern verifyingcircuit.
5.9. Implementation and Fabrication
5.9.1. Serdes I
A -4.5 V power supply was chosen for this chip. This left
plenty of room for the three levels of logic and the active current
sources. Power minimization was not a design goal so this voltage
was not optimized. Fig. 5-33 shows the artwork and fabricated
pictures of the first transmitter design, and Table 5-1 shows the pad connections.
The chip has two inputs: the 625 MHz reference clock and a full/half rate frequency
selector. Three outputs were included to diagnose problems with the PLL and delay chain.
Two pads output the LFSR sequence and another pad outputs when the LFSR is reset.
5.9.2. Serdes II
The goal for the second Serdes chip was to correct problems from the first iteration,
combine the transmitter and receiver into one chip, and make the chip packagable.
Correcting the problems involved redesign of the VCO, and PLLs to meet the 20 Gb/s
specification. Combining the two systems allowed the development of an on-chip testing
circuit (TERD), which could perform full feedback testing. A drawback was that fewer
probe pads were available in the larger chip. Designing for packagability i nvolved the use
0 1 2 3 4 5 6 7 8 9 10 11
CL
OC
K
4 5 6 70 1 2 3 8 9 10 11
5 0 10 67 3 11 13 2 4 15 189 12 14
Tx input pins
LFSR output pins
reset
T ransmitter
110
of an array of C4 pads for flip-chip packaging. Pad drivers and receivers were developed
to accept and drive the 16 bits of parallel input and output data.
The east half of the chip was comprised of the transmitter as shown in Fig. 5-34. High
frequency probe pads T4, and T5 were used for the differential serial out signals. The 625
MHz reference input pad, T8, and the PLL clock output pad, T9, were required for testing.
An on chip LFSR, which was part of the test system could be selected through a DC pad,
C8, to drive the transmitter. Bit 3 of the LFSR was routed to output pad T1 to verify the
proper functioning of the test system. The transmitter utili zed two VCOs, which could be
multiplexed through pad, C11, into the clock synthesizer PLL. A selectable divide-by-2,
circuit driven by pad C10, was added to the output of the PLL for half frequency operation
of the transmitter. An input filter to help suppress high frequency phase noise from the
reference could be activated by pad C9.
Table 5-1 Pin-out of Serdes I transmitter
Pin I/O Description
S0 not used
S1 RF input reference clock (625 MHz)
S2 DC input frequency select (20 Gb/s or 10 Gb/s)
S3 RF output PLL output (5 GHz)
S4 RF output delay chain output (/8) (625 MHz)
S5 RF output delay chain output (5 GHz)
S6 not used
S7 RF output LFSR: sequence reset pulse
S8 RF output LFSR: sequence
S9 RF output transmitter out
S10 not used
S11 not used
111
Figure 5-33 Serdes I transmitter layout and photographOn the left is the final artwork for the first transmitter design. On theright is a microphotograph of the fabricated part.
The receiver located on the west side of the chip, accepts differential serial data on
the two high frequency pads R4, and R5. The recovered clock, important for lock
verification, was routed to a pad R8. By using pads C3, and C4, four different
demultiplexed bits could be analyzed on pad R9 for proper operation. The test source built
into the receiver was controlled through C1 and C2, enabling three different test patterns.
The true error rate detector circuit pulsed pad R0 when a bad packet was seen and toggles
R1 when a good packet was detected.
In order to reduce chip power, the circuits were optimized around a supply voltage of
-3.3 V. This represents a 25% power savings when compared to the Serdes I -4.5 V supply.
LFSR
S0
S1
S2
S3
S4
S5
S6
S8
S9
S10
S11
S7
artwork fabricated chip
mux
driver
del
ay c
hai
n
PL
Lte
st
112
Table 5-2 Bondpad pin-out of Serdes II chip
Pin I/O Description Pin I/O Description
R0 RF out TERD: bad packet seen T0 RF out duplicated data into Rx
R1 RF out TERD: toggle every full packet T1 RF out LFSR: bit 3 into Tx
R2 Power Vee (-3.3V) T2 Power Vee (-3.3V)
R3 Power Gnd T3 Power Gnd
R4 RF in differential serial in T4 RF out differential serial out
R5 RF in differential serial in T5 RF out differential serial out
R6 Power Gnd T6 Power Gnd
R7 Power Vee (-3.3V) T7 Power Vee (-3.3V)
R8 RF out receiver clock T8 RF in ref clock (625 MHz)
R9 RF out selected demuxed data T9 RF out PLL out (divided by 8)
C0 DC in Rx test source control voltage C6 Power Vee (-3.3V)
C1 DC in Rx test source select A C7 not used
C2 DC in Rx test source select B C8 DC in select Tx input source
C3 DC in TERD: select A test bit C9 DC in enable Tx input filter
C4 DC in TERD: select B test bit C10 DC in enable TX PLL divide-2
C5 Power Gnd C11 DC in select VCO (5/10 GHz)
113
Figure 5-34 Serdes II chip layout and microphotographShown here is the full Serdes II chip including a microphotograph in thebottom left corner. The testing pads are located around the perimeter.
5.10.Testing Results
5.10.1. Serdes I (transmitter test results)
An output waveform captured directly from the oscill oscope is shown in Fig. 5-35(a).
It shows the bit pattern expected from the on-chip LFSR testing circuitry. The abilit y of the
PLL to achieve lock was very poor and a narrow pull -in range of 420 MHz to 460 MHz was
measured. The hold-in range was larger, from 393 MHz to 490 MHz: equivalent to a data
T0T1
T4T5
T8T9
C0
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
R0R1
R4R5
R8R9
R3
R6R7
R2 T2T3
T6T7
TxRx
16 bit input data
16 bit output data
114
bit rate of 12.6 Gb/s to 15.7 Gb/s. At a bit rate of 15.3 Gb/s, the rms phase jit ter was
measured1 to be 6.3 ps, or about 10% of the bit width.
Figure 5-35 Transmitter waveform (Serdes I)(a) The output waveform of the transmitter running at 15 Gb/s with a350 mVp-p swing. The pseudo-random pattern matches the expectedpattern from simulations. (b) An eye diagram at 15 Gb/s showing therelatively large phase noise and its effects on the closing of the eye.
Although the transmitter was designed to operate at 20 Gb/s it performed 25% worse,
15 Gb/s, which can be attributed to two important factors. The first was a result of the VCO
loading environment, which ideally consists of equal loading with four minimum sized
buffers. It was instead loaded with two buffers on one stage, one on each of two others, and
none on the fourth2. The effect was a reduction in speed probably due to the double load on
one stage, and a non-quadrature phase mismatch between stages. The second factor was a
result of simulations that did not adequately compensate for interconnect parasitics.
Resistive and capacitive effects at these frequencies can have a profound effect on the
1. Performing a true phase noise, and jitter measurement requires a spectrum analyzer capable of an abso-lute reading. A time domain oscilloscope, such as the one used to collect this data, merely measures the jitterbetween the signal and the trigger. If the trigger signal is correlated in time to the measurement signal thenthe jitter measurement can be quite a bit less than the absolute jit ter.
2. This was an oversight and was definitely not intended. The receiver which was designed a few weeksafter this had ideal loading characteristics. This improved its response and left the transmitter and receiverwith two non-overlapping frequency ranges.
(a) (b)
115
overall speed of the chip. Lack of time and understanding for these simulations produced
slower than expected results.
Both of the issues discussed were addressed and solved in Serdes II. The loads on the
transmitter and receiver VCOs were carefully checked to make sure loading was balanced
and minimal. Interconnect simulations produced better designs in critical circuits such as
the VCO and PLL. A wide margin was introduced in the design of the VCO to account for
unknown effects.
5.10.2. Serdes II (transmitter test results)
The Serdes 2 design was successful in attaining the 20 Gb/s target bit rate. The
relevant eye diagram is shown in Fig. 5-36. The output voltage swing is 350 mV and the
eye is 30 ps wide and 200 mV high. This represents a big improvement from the original
design, which failed to meet the specifications. The eye diagram is also much cleaner and
symmetric with less total rms jit ter.
Figure 5-36 Serdes 2 transmitter eye diagramShown here is an eye diagram at the target 20 Gb/s. It shows an opening30 ps wide and 200 mV high.
116
The PLL has a wide pull -in range from 3.6 to 5.3 GHz (14.27 to 21.58 Gb/s), which
is more than 75% of the total frequency range of the FFI VCO. The hold-in range is
identical to the pull -in range, indicating a well balanced and nearly optimal PD. When using
the higher speed VCO the pull -in range changed to 5.4 to 7.6 GHz, yielding an upper data
rate of 30 Gb/s.
Jitter measures the accumulation of transition offsets over a given length of time. For
an open loop, without a PLL, a clock will have exponentially increasing jitter with respect
to time. When placed in a PLL, the jitter levels off and becomes constant after one
bandwidth time constant. For the Serdes 2 PLL, the jit ter was measured with the time
domain oscill oscope at 4.3 ps with the reference signal and 2.9 ps without. This indicates
that considerable jitter was being introduced by the signal source.
Fig. 5-37 shows the phase noise spectra of the open loop VCO, the open loop
reference, the open loop reference plus 18 dB and the closed loop PLL. The reference plus
18 dB is the effective phase noise seen at the input to the PLL. The PLL closed loop phase
noise behaved as expected. First, at low frequencies the phase noise approached that of the
reference. This phase noise was expected since this was well below the loop bandwidth of
6.2 MHz and the PLL is able to track out the VCO leaving just the reference noise on the
output. The difference between the PLL and reference phase noise is li kely from noise
introduced in the loop filter. Close to the loop bandwidth of 6.2 MHz, the sum of both the
reference and VCO noise contributed to the total noise. And above the loop bandwidth, the
phase noise should follow closer to the VCO phase noise and that is what was seen.
A more accurate way to measure jit ter is in the frequency domain. This enables the
removal of the in-band low frequency jitter, which is easily removed by the receiver PLL,
from the rms jitter measurement. Integrating the PLL phase noise plot from 100 kHz to 100
MHz gives an rms jitter of 1.4 ps. This value is lower than the 4.3 ps found with the time
domain oscilloscope, which indicates that a larger amount of low frequency jitter can be
found in the reference signal.
The preliminary specification for OC-192 SONET indicates that the maximum
acceptable jitter must be less then 0.09 UI (Unit Interval) for 1012 bits. Finding the
associated rms jitter involves integrating the Gaussian probabilit y density function (pdf)
117
from x to infinity and setting the result equal to the bit error rate of 10-12. The value of x is
about 7.5 standard deviations, yielding a rms jitter specif ication of 1.2 ps at 10 Gb/s.
Although the transmitter jitter of approximately 1.4 ps is larger than the SONET
specification of 1.2 ps, this circuit was not designed with SONET in mind. For short-haul
communications higher jit ter is more acceptable.
Figure 5-37 Tx PLL measured phase noise spectraThe PLL closed loop behaved as expected with the PLL tracking out theVCO noise at low frequency and following the VCO noise at highfrequency.
5.11.Future Design
The extremely large scope of this project left a number of areas of research untouched
and undeveloped in the first two fabricated designs and the third simulated design. The
basic elements of the transmitter were designed with optimizations and research performed
only in specific areas. The remainder of this section describes key areas that are
recommended for future effort in order to establish these designs as highly functional,
useful, production-worthy designs.
-140
-130
-120
-110
-100
-90
-80
-70
-60
0.1 1 10 100
Frequency (MHz)
Pha
se N
oise
(dB
c/H
z)
VCO open loop
PLL closed loop
ref - 18 dB
reference open loop
118
5.11.1. 8B/10B Encoding
8B/10B encoding solves such issues as transition density imbalance, error detection,
command insertion, and DC balancing [26], [35]. It does so by adding an additional two
bits of additional information for every eight bit input and requires a 25% increase in speed
for the same information throughput.
The frequency of transitions in the data is a very important factor in the design of the
receiver. In general, the more transitions provided to the receiver, the better the PLL’s
abilit y to lock into the serial stream. 8B/10B encoding guarantees a maximum run length
of five bits, and a lowest transition density of 30 transitions per 100 bits. Defining a
minimum density makes it easier to model the data stream arriving at the receiver.
Another feature of the encoded stream is an equal number of ones and zeros. This
allows all single bit errors to be detected. In addition, because of the much larger 10 bit
word space, the decoder can detect undefined words and flag them as errors.
The DC balance is the average of the number of ones and the number of zeros. For
high speed optical li nks, it is very desirable to have a DC balance of 0.5, which corresponds
to an equal numbers of ones and zeros. This stabili zes effects, such as heating in the optical
circuits, which can be a function of the sign of bits being sent. 8B/10B guarantees a DC
balance of 0.5 because it forces equal number of ones and zeros per character.
Since data encoding occurs at the parallel data rate of 1.25, Gb/s the necessary
circuitry can be designed completely in CMOS. This reduces power, and space
consumption, and allows the use of powerful EDA tools for layout and design.
An additional role for 8B/10B encoding is for channel alignment, which guarantees
that the bit 0 of the Tx is connected to bit 0 of the Rx. This requires a 16 bit rotator with a
detection mechanism to rotate the streams until they match.
5.11.2. Transmitter data retiming
A technique that can be used to reduce the output phase jitter of the transmitter is to
clock the output signal directly from the PLL through an MS-latch. This retiming circuit
alleviates all the noise introduced by the multiplexers and provides the minimum signal
path between the transmitter serial output and the PLL.
119
A significant source of jitter on the output data is called deterministic jitter. It is the
result of non-periodic data induced noise. Pull -up resistors at the top of CML trees are a
common source because as current flows through the resistor they heat up; warmer resistors
produce higher rms noise. The ultimate effect is that the noise becomes dependent on the
data stream. A stream with a large number of zeros will have a higher noise component than
one with an equal number of ones and zeros.
The problem with data retiming is that it requires a latch that can operate at the
functional speed of the transmitter. In this case, that speed is 20 GHz, and if some encoding
is introduced then it can be as high as 25 GHz. Simulations show maximum operation of a
latch to be unreliable above 15 GHz. This is a result of the large delay through the two CML
tree gates and the feedback that is inherent in these circuits.
Although direct data retiming is unattainable unless a much faster latch is found,
other improvements can be made. Since the final 4-to-1 (symmetric) multiplexer defines
the output jit ter, an improvement would be to drive the multiplexer directly by the PLL
rather than through the timing delay chain. This adds to design difficultly because the
timing of the entire transmitter is running opposite to the timing of the data. The primary
benefit of this method is the reduction of f ive buffers of phase noise introduced by the delay
chain.
Figure 5-38 Data and clock timingBy moving the PLL to the input of the multiplexer (b), the clock mustrun opposite the data. This creates timing difficulties but decreasing theoutput phase noise of the transmitter.
5.11.3. LC Oscillator
The primary drawback to using the FFI ring oscillator in the transmitter is its very
poor phase noise characteristics. LC oscill ators have much higher quality factors and
PLL
data
clock
transmitter
PLL
data
transmitter
clock
(a) (b)
Current Method Proposed Method
120
considerably less phase noise and jitter [21],[22],[44],[45]. One problem with typical LC
VCOs is that they only produce a single phase clock, but the transmitter architecture in this
research requires a clock and its quadrature. A possible option, and an area for further
research is in multiphase LC oscill ators [46],[47]. They have the best of both worlds: low
phase noise, and quadrature outputs.
121
6Design of the Receiver
6.1. Project History
The first receiver (Serdes I) was designed for fabrication
in February 1999 and only had a 1-to-4 demultiplexer and clock
extractor. Various improvements and optimizations yielded
Serdes II, which was a more efficient design, capable of full 16 bit demultiplexing and
external data input.
6.2. Receiver Architecture
Figure 6-1 Top level receiver architectureThe receiver is a PLL with a PD, called a transition detector, a PI loopfilter, a VCO, and a demultiplexer to extract the NRZ bits from theserial data.
The receiver is a PLL and demultiplexer that locks an internal VCO to externally
supplied data and extracts the non-return-to-zero (NRZ) bits from the data. Data arrives
serially as a differential signal and is buffered in preparation for driving the PD. The
information collected about transition phases is combined and fed into a proportional and
integral loop fil ter. The filtered signal is used to drive the VCO to a frequency which
matches the frequency of the external data. In addition to collecting timing information, the
Receiver
data
8 phases
VCO
Phase Detector(PD)
loop filter (PI control)
4 16 demultiplexed data
122
PD also performs a 1-4 non-aligned demultiplexing of the data. Another circuit, also driven
by the VCO finishes the demultiplexing and generates 16 bits of parallel data.
6.3. Receiver PLL
The receiver PLL is considered a clock and data recovery
(CDR) circuit and has the primary role of extracting the data bits
from the serial signal and ensuring that the extracted bits are not
corrupted. The process is made more diff icult than in a standard
PLL, because random or pseudo-random data has no guaranteed
transition times. The 3-state and XOR PDs used in the transmitter
PLLs, for example, can only operate with periodic signals. A specialized PD that can
handle non-periodic information and allow a VCO to lock to the fundamental frequency of
the data is required. Merely locking the VCO to the data’s frequency is only half the
problem. The system must also sample, or extract the information contained within the data
stream, using the recovered clock
The receiver designs for Serdes I through III, all util ize a transition detector (TD) PD.
It twice oversamples the data signal and generates a digital measure of the phase difference
between this signal and the clock. It essentially indicated whether the clock is too fast or
too slow relative to the data. With this information, lock can be acquired and because of the
nature of the sampling, data can easily be extracted. The problem with this PD, which was
addressed in the third prototype, is the very small pull -in range of the PLL. Without an
analog measure of phase difference, the clock and data frequencies have to be very close
for the PLL to pull- in.
Fig. 6-2 depicts block diagrams for the three receiver prototypes. The first and second
designs differ in the integrator design, and the VCO. The third integrates an entirely new
loop that is very good at acquiring frequency lock but poor at extracting the data, into the
PLL [14], [30], [51]. Together with the TD PD, the PLL’s pull -in range is greatly increased
without any sacrifice in performance.
Receiver
123
Figure 6-2 Receiver PLL evolutionThe receiver PLL has gone through two major improvements. The firstdesign util ized a FET charge pump which was replaced with a negativeimpedance charge pump in the second design. The third prototype addeda referenced frequency detector which greatly improved the pull -inrange of the loop.
FETchargepump
negative impedencecharge pump
gainblock
transition detector (PD)
transition detector (PD)
gainblock
VCO
transition detector (PD)
dat
ad
ata
3-state PD
dat
a
negative impedencecharge pump
gain block 2
gain block 1
VCO
VCO
VCO
Serdes I
Serdes II
Serdes III
refe
ren
ce
124
6.3.1. Phase Detector
6.3.1.1. Transition Detector (PD)
Data transitions provide the only means to measure the phase of the incoming serial
data. If the data were periodic then we could be assured of a transition at a specific time and
directly compare it with a coincident VCO transition, similar to the clock synthesizer PLL
in the transmitter. However, data by definition, is non-periodic and transition locations
cannot be assured at any time. For example, data containing ten ones followed by twelve
zeros, containing only two transitions, could be received. Since a transition between bits
cannot be guaranteed, there must be no action when no transitions are received and tracking
must be performed when transitions are received.
The aspect of the clock recovery circuit that had criti cal implications on its
development, was the use of the same eight phase ring oscill ator used in the transmitter. It
was felt that by matching the oscill ators in the transmitter and receiver, they could be
ensured to operate at the same speeds and the development of only one VCO would be
required.
Running at 5 GHz, either the CS, or FFI VCO generates eight unique phases (0o, 45o,
90o, 135o,...)1 each separated by 25 ps. Serial data, arriving at 20 Gb/s can be broken up
into bits 50 ps wide. Taking complete advantage of the multi -phase clock, the data is
sampled every clock phase resulting in a twice oversampling receiver scheme. In other
words, for every bit, two samples of the signal will be taken.
Sampling is handled by eight MS-latches whose clock inputs are tied to one of the
eight clock phases (see Fig. 6-3). In the locked and stable condition, four of the latches
sample at the center of the bits and return data information while the other four sample on
the transition and return timing information only. If the latches are labeled consecutively
by their clock phase inputs, W, X, Y, Z and their inverses, then the data latches are W, Y, W,
and Y, while the timing latches are X, Z, X and Z.
1. Although the VCO has only four unique outputs the inverse of each of them yields the remaining fourphases.
125
Figure 6-3 Receiver topologyThe receiver is made up of eight MS-latches, each tied to a unique phaseof the VCO. Since each phase is separated by 25 ps, the data is twiceoversampled, and thus, able to extract transition timing informationfrom all edges. FAST or SLOW in the diagram is a command to theVCO.
Fig. 6-4 shows a detailed look at the transition detector used in Serdes I. Data is
latched with L1 using ΦΦΦΦn, the n-th buffered phase of the VCO. ΦΦΦΦn and ΦΦΦΦn+1 are consecutive
phases of the VCO, separated by 25 ps, or 45o, and ΦΦΦΦn is equal to ΦΦΦΦn+8.The sampled data,
W X Y Z W X Y Z
0 ps
25 ps
50 ps
75 ps
100 ps
W X
200 ps
W X⊗ FAST=
X Y⊗ SLOW=
Y Z⊗ FAST=
Z W⊗ SLOW=
W X⊗ FAST=
X Y⊗ SLOW=
Y Z⊗ FAST=
Z W⊗ SLOW=
W
XY
Z
W
XY
Z
transition location detector
sampling latch
dataA
F
dataB
serial stream
dataD
dataC
serial data
transition detector
phase sli ce
DQ
S
DQ
S
DQ
S
DQ
S
DQ
F
DQ
F
DQ
F
DQ
F
VCO
126
sn, is XORed with the sample from the previous detector, sn-1, and retimed with L2. The
clock input to this latch comes six phases later, or after 150ps, in order to allow the output
of the XOR to settle to the correct value. tn, the output of L2, indicates whether a transition
has occurred during this phase slice. The total time that the tn signal remains high is
dependent on the period of the VCO and whether additional transitions are detected in this
phase slice. With the VCO running at 5 GHz, the minimum time that tn is high is 200 ps.
This circuit is then repeated eight times to collect transition information from every
transition.
Figure 6-4 Transition detector in Serdes IThe first iteration of the transition detector had a latch to sample thedata. This sample and the sample from the previous detector are XORedtogether and latched again to produce the transition detector signal.
The phase plot in Fig. 6-4 shows a transition detector on the X (45o) phase. It uses
samples from itself and from the previous detector to detect transitions within the shaded
region. The XOR of these signals is clocked six phases later.
One of the issues that defines the performance of this circuit is the time between when
the data is sampled and when the detected-transition signal changes. Assuming a 20 ps gate
delay, the approximate time is 170 ps. And since the transition detected signal is high for
200 ps, the effect of a single transition lasts for a total of 370 ps after the sample, which is
equivalent to 7 bits. This is important, because during lock it is desirable to have the
frequency of the VCO adjust as quickly as possible after a transition is detected. The digital
nature of this circuit results in discrete changes to the VCO output, so oscillations are
natural when in lock. If the PD delay is large then these oscil lations will also increase, as
the VCO’s frequency continuously overshoots and undershoots. A further analysis of this
phenomena can be found in Sec. 6.3.2. on pa ge130.
The motivating factor in the design of Serdes II’s TD, shown in Fig. 6-5, was to
reduce the delay through the detector. In the first prototype this time was 170 ps, which
D Q D QΦΦΦΦn’
data
ΦΦΦΦn
ΦΦΦΦn+6sn
sn-1
tn
L1 L2
sn
sn-1
ΦΦΦΦn
ΦΦΦΦn+6
tn
MS-latch
127
directly effected the abili ty of the PLL to maintain and acquire lock. In order to improve on
that design a look at the timing requirements of the XOR was required.
The two level nature of the XOR gate requires the level 2 input to precede the level
1 input by approximately 10 ps. The time between sampled data sn-1 and s is equal to 25
ps, and with the additional 5 ps of delay introduced by the level 2 output of the MS-latch a
total of 30 ps is found between the level 2 input to the XOR gate and the sn-1 signal. When
40 ps of buffer delay is added to the sn-1 signal, a time delta of 10 ps between the inputs of
the XOR gate is realized.
Figure 6-5 Transition detector in Serdes IIOptimization of the transition detector all owed the removal of thesecond MS-latch and reduced the total delay by 75%. This circuit issimplified and requires a less complicated layout.
When the timing is optimized to this extent, the necessity of the second MS-latch, L2,
is removed. The same 200 ps pulse is created, but the total transition detector delay has been
reduced from 170 ps to 40 ps. An additional benefit is in the simplified layout of this circuit;
only one clock phase is required. In the Serdes I circuit, a complex routing scheme was
required because two phases were necessary.
The gain of the transition detector is not clearly defined because of the digital nature
of the circuit. When the phase difference is greater than zero, it will generate a slow pulse
and when less then zero, it will generate a high pulse. There is no linear relationship
between phase and output. Instantaneous gain must therefore be defined to be infinite. The
average gain, however, is not infinite and can be found when a statistical distribution of
transitions or jitter is introduced.
A real data signal does not have perfect transition separation but instead has
transitions separated according to a constant plus a random gaussian variable. This jitter
acts as “ transition fuzz” which effectively gives the PD gain. The process of calculating this
gain is shown in Fig. 6-6 for both a uniform and Gaussian distribution. Fundamentally, it
D QΦΦΦΦn’
datasn
sn-1
tnL1
sn
sn-1
ΦΦΦΦn
tn
12
128
comes down to subtracting the two areas created by split ting the probability density
function (pdf) around zero, after setting a specific mean and standard distribution. For
Gaussian jitter, an approximation; the gain is assumed linear based upon a line that passes
through the point at one standard deviation.
Figure 6-6 Gain of transition detector with data jitterSolving for the gain of the transition detector must take into account thefact that the data has ji tter. This jitter spreads out the transitionsproducing an average PD output.
In order to include the effect of the transition density (tpb = transitions per bit), Kd is
multiplied by tpd. A factor of four must also be included to account for the fact that a
slow/fast pulse is carried across 4 bit widths. This yields the final transiti on detector gain:
In the Serdes I implementation with a pulse size, Vp, of 300 mV, a transition density of 1/4
and an rms ji tter value, σt, of 4 ps, the detector gain equals 811 mV/rad. In the Serdes II
transition detector, the pulse size was reduced to 40 mV yielding a smaller gain of 108
mV/rad.
θθθθe
instantaneous PD output
θθθθ
uniform transition distribution (pdf)STD = σ
average phase error
θθθθ
average PD output
σσσσ
0.58
0.68
Kd' A0.58σ
----------=
Kd' A0.68σ
----------=
σσσσ
gauss ian transition distribution (pdf)STD = σ
A
-A
Kd Vp0.68
σ----------4 tpb( ).= σ σt
2πrad100ps----------------= (6-1)
129
6.3.1.2. NRZ Phase/Frequency Detector (PD/FD) (Hogge)
The digital nature of the transition detector PD and its phase response, yields a very
poor pull -in range. When lock is acquired, however, this PD has very strong noise
immunity, and an inherent abilit y to extract data from the signal. The Hogge PD helps the
poor pull -in range but has no net effect on the TD PD properties. Its use, in conjunction with
the transition detector PD, was evaluated but not implemented for Serdes III .
The schematic of the Hogge PD is shown in Fig. 6-7 [52], [53] which operates on the
NRZ data and generates an analog signal based upon the difference between it and the
VCO. Data, vi, must arrive at half the frequency of the clock, vo, for the PD to operate
correctly. This is accomplished by dividing the input data signal down 4 times. This has the
negative effect of removing every three out of four edges. The two latches and the va XOR
gate retime the data by creating pulses based on data transitions but timed to the clock
transitions. The vb XOR gate, on the other hand, has a similar waveform but the edges are
timed with the data transitions. The dc component of the difference between these two
signals yields a measure of the phase difference.
Figure 6-7 Phase detector for NRZ dataThis circuit shows one technique for detecting phase for NRZ data in aPLL. The bit rate of the data and frequency of the clock must be thesame. The output is taken differentially and yields an continuous analogsignal as a function of phase difference.
D
Q1
D
v i
vo vb
v i
vo
Q2
Q1
vb
Q2
va
va
∆∆∆∆θθθθ
vd
ππππ
−π−π−π−π
vd
∆∆∆∆θθθθ
for 50% transition density
critical delay
130
The most important aspect in implementing this PD was maximizing the figure of
merit. It this case it is defined by the range of pulse widths expressed in vb against the
constant width of va pulses. Ideally, the widths of vb would range from 0 to twice the width
of a va pulse. Finding this solution required a fine adjustment of the criti cal delay, which is
approximately the delay through an MS-latch. By minimizing the integral of the vd versus
∆∆∆∆θ θ θ θ plot over a full 2π radians, the figure of merit can be maximized.
The gain of this PD is a function of the transitions per bit (tpb) for the incoming data
stream. For a 11001100... stream, the tpb is equal to 0.5. From simulation, the gain was
found to be 80 mV/rad/tpd, which includes the divide-by-4 circuit.
Ultimately this PD was not used because it was exceeding difficult to optimize the
delays in the circuit. Slowing down the clock and data was the only way to correct the
problem and as a result the pull -in range suffered. The Serdes III implementation addressed
the small pull -in problem by using an external reference signal.
6.3.2. The Loop Filter
The purpose of the loop filter is to take the digital
transition information from the eight transition detectors and
create an appropriate VCO signal. The transition detectors yield
relative information in regards to data and clock phase offset, so
an integrator is required. An integrator alone is insufficient in the
loop, so a proportional factor is summed with the integrator
output. Together the proportional and integral control comprise the PI loop filter.
Although the loop filter in Fig. 6-8 is expressed as a integral and proportional gain it
can also be expressed by the pole-zero equation
where ω2 is the loop zero and Kh is the high frequency gain.
Unlike the frequency synthesizer in the transmitter, the integrator and proportional
gain components must operate at the frequency of the clock and accept four faster and four
slower signals. This necessitates the use of specialized circuits able to handle the much
Receiver
Kh
s ω2+
s--------------- Kh KP= ω2
KI
KP
------= (6-2)
131
higher frequency. The Serdes III design, although slightly more complicated, stil l contains
the basic components shown in Fig. 6-8.
Figure 6-8 Receiver loop filterThe receiver loop filter accepts eight “digital” signals from thetransition detectors and produces an analog control signal for the VCO.
6.3.2.1. FET Charge Pump / Proportional Control (Serdes I)
The charge pump integrator shown in Fig. 6-9 utili zes four field effect transistor
(FET) pairs to place and remove charge from the capacitor. Each FET can act
independently of the others, so one could be adding charge while another is removing it.
Careful consideration assured that the nFET and pFET sizes were chosen to have matching
currents.
Each FET draws on average 60 µA during one complete period of the clock. With a
300 mV input from the PD this corresponds to a 0.0002 1/Ω gain from the FETs. With Cf
equal to 4 pF, a slow/fast pulse will change the capacitor voltage by ± 3 mV. Dividing the
FET gain by the capacitance yields the integrator gain K I = 50 Mrad/s.
Proportional control, on the other hand, is handled through eight differential
switches, one for each fast and slow PD output, with one branch tied together to form a
single-ended “analog” signal (Fig. 6-10). By default, without any fast or slow signals, all
fast trees will pull 0.75 mA through the pull -up resistor Rcc and all slow trees will pull 0
mA as shown in Fig. 6-10. In this way, the voltage across Rcc will increase when a fast
signal is received and decrease when a slow signal is received. Rcc was set to 100 Ω, which
produces a 75 mV change for each input pulse. The emitter follower tied to Rcc only
introduces a DC offset to interface properly with the summing junction. Designed similarly
to the integrator, the proportional circuit inputs are all able to operate independently.
KI/s
KP
4
4VCOKo
loop filter
slower
fast
erphase detector(s)
8
132
Figure 6-9 MOSFET charge pump integratorThe FET transistors in this circuit act as current switches removing andadding charge to a capacitor. This action integrates the slow and fastinputs.
Figure 6-10 Proportional control and summing junctionThis circuit provides the proportional gain for the loop filter and sumsthe result with the signal from the charge pump integrator. Thisultimately drives the aVref control voltage for the VCO.
For each 300 mV input pulse, the output of the proportional control circuit changes
by 75 mV. This corresponds to a proportional gain, Kp, of 0.25. The summing junction
combines the outputs of the integrator and the proportional gain stage. It introduces an
This MOSFET is designed to bal-ance the current drawn from the base.
S: A slow signal places a charge packet on the capacitor.F: A fast signal removes a charge packet from the capacitor.
Vcc
-2 V
S1 S4
F1 F4 Vint
4 MOSFET pairs
Cf
Rcc
Vint
aVref (VCO)
repeated 4 times for each S/F pair
F1 S1S1F1
summing junction
133
additional gain of 0.286 into the total gain of the loop. Given the gain derived above the
loop filter has a zero, ω2, at 32 MHz and a high frequency gain, Kh, of 71.5 m. Collecting
all the gains from this circuit and multiplying by the pulse period shows a ±0.7 ο phase
change of the VCO for every slow/fast pulse.
6.3.2.2. Negative Impedance Charge Pump (Serdes II)
The goal for the receiver in the Serdes II implementation was to replace the FET
charge pump and proportional control with a much simpler negative impedance charge
pump, while keeping all the PLL parameters the same. There were problems associated
with the FET pump including: poor high frequency response, diff iculty in matching pull -
up and pull -down components, high capacitance discharge, and significant complexity. The
negative impedance pump solved all of these problems with a smaller and simpler circuit.
Using the circuit in Fig. 5-21, equations (5-7)-(5-10), and the loop natural frequency,
zero, and pole of 25 MHz, 6.4 MHz, and 102 MHz, respectively, C1 = 575 pF, C2 = 38 pF,
and R = 43 Ω. A high frequency pole was added to reduce spurious modulation and reduce
the clock jitter and had littl e effect on the overall loop response.
6.3.2.3. Mixed Loop (Serdes III)
The primary design goal of the third Serdes implementation was to improve the poor
pull -in range of the transition detector that was due to its non-linear nature. This resulted in
the serial data frequency being required to be very close to the nominal frequency of the
VCO for pull -in to occur. Given a specific bit-rate this can be very difficult to design across
all thermal, process, and implementation deviations.
An initial approach utili zed a down-counted data signal fed into a separate Hogges
style NRZ PD (Section 6.3.1.2. on page129). The idea was to utilize a second PD that had
a larger pull -in range and could be coupled with the TD PD loop for a better overall pull -
in range. This NRZ PD proved to be difficult to design due to very strict delay requirements
and it did not signif icantly improve the pull- in range.
A second approach used an additional loop which accepts a reference at the (bit
rate)/8 and was designed to respond identically to the loop in the transmitter (Section 5.4.
on page 82). The loop filter output is summed with the transition detector of the original
134
loop to create the VCO’s control voltage as shown in Fig. 6-2 on page123. The purpose of
the new loop is to acquire frequency lock, which pulls the first PLL into lock because of
the common integrator. The second loop is able to acquire solid phase lock once within its
lock-in range and then begin to extract data.
The parameters for the new loop are identical to those previously used. The only
remaining design choices are the gain of the TD PD, and its filter. Choosing an appropriate
gain for the transition detector involves a trade-off in bit error rate and the lock-in range.
At one extreme, a large gain wil l give the PLL a large lock-in range that is approximately
equal to the bandwidth of the loop. For instance, a doubling of the PD gain will result in a
doubling of the lock-in range. This higher gain however, results in a higher bit error rate
(BER) because of the large phase correction. On the other extreme, a small gain will limit
the bandwidth and the lock-in range, but reduce the error rate.
The effect of a large gain on BER results from consecutive transitions that are jittered
in one direction causing an accumulation of phase change. The mean frequency of the data
and of the clock are assumed to be constant, an assumption that reasonable over the few
transitions needed in this analysis.
The BER of single bit errors is given by Q (jitter > 25 ps) which is equal to 3x10-15
for an rms data jitter of 4 ps, and bit width of 50 ps. Q(x) is the integral from x to infinity
of the normalized Gaussian probabilit y density function (pdf). If the BER introduced by the
TD is less than this value, then its effects can, in general, be ignored.
The TD introduces a ∆t ps phase change per transition. The worst case scenario for
an error is when enough phase changes bring the clock phase to 12.5 ps from consecutive
data jitter followed by a ji tter of -12.5 ps in the other direction. In such a case the phase
difference between the clock and the data will be 25 ps. Solving for this is best done by an
example. Assume ∆t equals 5 ps.
Q( jitter > 0 ps ) = 5x10-1 -- make 5 ps phase adjust
Q( jitter > 5 ps ) = 6x10-2 -- jitter must be > then 5ps
Q( jitter > 10ps ) = 9x10 -4 -- ... and so on
Q( jitter > 15ps ) = 1X10 -6
Q( jitter < 10ps ) = 9x10 -4 -- bit error!---------------------------
total probability = 3x10-14
For this example, there were four consecutive “jitters” in the positive direction,
causing a clock phase change of 25 ps. They were followed by a jit ter of 10 ps in the
135
opposite direction. The probability of these individual events are multiplied together to find
the total probability for an error from this chain of events. For the same analysis, but with
∆t equal to 4 ps the result is 7x10-19. In conclusion as long as ∆t is kept below about 4 ps
then the effect of accumulated jitter on phase will be smaller than the chance of a single bit
error, and can be ignored.
Without an integrator in the loop, the VCO control voltage can not exceed the
maximum swing of the TD. Given a 1010 sequence at 20 Gb/s (tpb=1), there would be four
overlapping pulses of magnitude ∆t, which, when multiplied by the VCO gain, yields the
frequency deviation. This defines the lock-in range of the TD loop and is equal to
where ∆v is the magnitude of the voltage pulse from the TD. The factor of 4tpb takes into
account the fact that the TD has no effect on the frequency if there are no transitions. The
more transitions, the larger the potential frequency deviation. Relating a voltage change to
an associated time change yields
Combining the previous two equations to find the lock-in range as a function of ∆t results in
where ωc is the clock frequency.
Typical specifications for a receiver of this type provide for a reference signal which
is within 100 ppm of the frequency of the data. Using a more conservative value of 1000
ppm gives a maximum reference deviation of 20 MHz. Using this value in (6-5) gives a
minimum ∆t of 0.4 ps.
For the final implementation, a value of 0.6 ps was chosen for the phase correction
for every transition. The lock-in range is therefore 30 MHz at a 0.25 transitions per bit. This
relates to a 4 mV pulse which is generated within the TD by combining the eight slow and
fast signals through a common set of pull -up resistors. The resistors were set at 5 Ω with an
0.8 mA current source in each tree.
ωL ∆vKo 4tpb( )= (6-3)
∆v∆t fc
2
Ko
----------------.= (6-4)
ωL 2∆t ωc 24tpb( ).= (6-5)
136
6.3.3. PLL Loop Response
6.3.3.1. Serdes I (FET charge pump)
The total loop gain or bandwidth is found through a product of the VCO gain, K o =
3.14 Grad/s/V; the PD gain, Kd = 811 mV/rad; and the loop filter gain, Kh = 71.5 m and is
equal to 29 MHz. With the loop zero at 32 MHz this yields a damping factor
equal to 0.5 which is underdamped with an overshoot of 30%. For all higher transition rates
the PD gain will increase and increase and improve the damping factor.
Fig. 6-11 depicts the Serdes I PLL locking into a 6.1 Gb/s (tpb = 0.25) data stream.
Using an AHDL program the data was given an rms jitter of 4 ps, which is approximately
the amount produced by the associated transmitter. Up until 5 ns the PLL is pulling-in and
after 10 ns lock-in has occurred. The large deviations around 6.1 GHz are due to the
proportional control mechanism pulsing the frequency to cause a phase correction. During
the phase correction the integrated is forcing the average frequency to equal that of the data.
The non-linear “digital” nature of the PD results in a very limited pull -in range. From
simulation through various initial frequency offsets yields a range of about 2%. The hold-
in range on the other hand is quite large due to the integrator.
6.3.3.2. Serdes II (negative impedance charge pump)
Fundamentally, the Serdes II implementation was very similar to the Serdes I
version. The key parameters, including loop bandwidth, were kept the same though a
slightly different PD, an improved loop filter, and an improved VCO were used. Because
of this, the response is nearly identical to the Serdes I design shown in Fig. 6-11.
ζ 0.5 Kω2
------= (6-6)
137
Figure 6-11 Serdes I loop locking inThis plot shows the Serdes I receiver VCO locking into 6.1 Gb/s, 4 psjitter data. Once frequency lock is established the proportional pulsesoscillate around the target frequency.
6.3.3.3. Serdes III (dual-loop / referenced loop)
The Serdes II I implementation has two loops: one independent loop that dictates the
frequency, and a second dependent loop that phase locks to the incoming data. Fig. 6-12
shows the frequency loop locking in to a reference signal at 750 MHz which is a 6 GHz
clock. Because the same PLL was used in the transmitter of the Serdes III implementation,
the acquisition plots shown in Sect i on5.4.6.3. on page101 show behavior identical to the
operation of this frequency loop.
Also shown in Fig. 6-12, is the phase plot for the phase loop locking in to data with
tpb = 0.25. Lock-in occurs when the clock frequency is about 6.02 GHz, which is within 20
MHz of the clock frequency. It was expected that lock-in would occur when the clock was
within half of 30 MHz or 15 GHz.
The noise seen on the locked-in phase plot is from 4 ps rms jitter added to the data
through an HDL model (Appendix E.5. on page183). This enabled a more accurate and
faster simulation. The choice of jitter is directly related to the jitter produced by the
transmitter, with the assumption that the channel introduces littl e noise.
6.06
6.07
6.08
6.09
6.1
6.116.12
6.13
6.14
6.15
6.16
6.17
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0Time (ns)
Fre
quen
cy (
GH
z)
138
Figure 6-12 Frequency and phase lock-in of Serdes III Rx PLLThe dual loop nature of the Serdes II I Rx PLL allows an independentreferenced loop to frequency lock close to the data frequency. Thesecond loop phase locks when the data and reference frequencies arewithin 0.3% of each other.
6.4. 4-16 Demultiplexing
The transition detector naturally performs 4-16
demultiplexing. It has eight sampling circuits, four of which are
actual data. Each of the data bits are available sequentially and
as such, all four are valid for only one bit time: 50 ps at 20 Gb/s.
This can make timing very dif ficult.
Serdes I was not capable of performing the 4-16
demultiplexing. It could only output the four sampled bits directly off the detector.
The demultiplexer added to Serdes II is shown in Fig. 6-13. It uses four 4-bit MS-
latches each separately clocked by four phase offset clocks. The clocks are generated with
a counter driven by a phase from the PLL. The latches simultaneously sample the 4-bit data
from the transition detector. The transition from the fourth bit, followed by the transition
0
50
100
150
200
250
300
350
0 20 40 60 80 100
Time (ns)
Sam
plin
g P
hase
(de
g)
5.94
5.96
5.98
6.00
6.02
6.04
6.06
6.08
Clo
ck F
requ
ency
(M
Hz)
frequency
phase
Receiver
139
from the first bit, dictates the window that the clock has to sample the data. Delays on the
clock lines had to be carefully balanced and tightly controlled to ensure that the bits were
sampled at the correct time.
Figure 6-13 4-16 demultiplexer architectureThe demultiplexer accepts the set of four bits from the transitiondetector and samples each set into four separate registers. Once 16 bitsare captured those registers are resampled by a 16 bit register to producethe final output.
After all four latches contain a total of 16 bits, another bank of latches resamples all
the bits at once. This register uses the fourth clock, Φ4, plus a small delay. This delay should
be longer than the delay through the first register to capture the 4th bank correctly. The
delay must also be shorter than the time when the 1st bank is sampled. For a 20 Gb/s system,
the clock has a 200 ps window and was placed as close to the center as possible.
6.5. Registers and Decoding
Often a First In First Out (FIFO) system is added to the
output of the demultiplexer. This reduces the timing constraint
on the circuit that reads the 16 bits of parallel data off the chip,
through the use of a separate load clock. A FIFO was not
transition detector
ΦΦΦΦ1
da
db
dc
dd
clock window
dem
ult
iple
xed
dat
a
ΦΦΦΦ2
ΦΦΦΦ1
ΦΦΦΦ2
ΦΦΦΦ3
ΦΦΦΦ4+ττττ
ΦΦΦΦ4
Receiver
140
included in either Serdes I or Serdes II in which the output data is only latched in the 4-16
demultiplexer.
Data decoding is a general term for such techniques as decryption, decompression,
error detection, channel alignment, byte alignment [38], DC voltage balance, simpli fied
clock recovery, frame detection [33], and so on. No encoding was performed in either
Serdes I or Serdes II . See Section 5.11.1. on page 118, for a quick study and
recommendation of the 8B/10B encoding scheme.
6.6. Line Receiver
The line receiver accepts serial data at up to 20 Gb/s. Its
bandwidth must be wide enough, usually 50% higher than the 10
GHz fundamental, to ensure that the data is reproduced
accurately [14], [48], [36], [37], [49].
The Serdes I line receiver consists of a simple single-
ended pad receiver, and is not optimized for bandwidth. The
Serdes II circuit is fully differential and consists of a 6 µm buffer with emitter followers
and 50 Ω termination resistors.
6.7. Test Circuitry
6.7.1. On-chip test pattern generation
Testing the receiver, by itself, at speed is impossible
without a 10 GHz differential signal generator to drive the data
inputs. In order to eliminate reliance on external testing
hardware, the necessary generator was added internally. This
was done in both fabricated Serdes chips by using a 5 GHz VCO
in three different configurations. The first signal was generated by multiplying separate
phases of the VCO to create a 10 GHz bit stream. The second was simply one phase of the
VCO for 5 GHz and the third signal was a phase divided by two for 2.5 GHz. A 4-to-1
Receiver
Receiver
141
multiplexer was added to select between these three generated signals and the forth external
data signal.
6.7.2. True error rate detector (TERD)
The true error rate detection circuit operates between the transmitter and receiver. It
determines bit error rate through an LFSR matched to the transmitter LFSR. Its operation
was discussed in detail in Section5.8.2. on pa ge107.
6.8. Implementation and Fabrication
6.8.1. Serdes I
As stated previously, The power supply in the Serdes I
chips were choose to be -4.5 V. This left plenty of room for the
three levels of logic and the active current sources. Power
minimization was not a design goal so this voltage was not
optimized. Also a -2.0 V supply was required for the bottom of
the CMOS charge pump. Table 6-1 shows the pin-outs of the receiver chip and Fig. 6-14
shows the final layout artwork and the microphotograph of the fabricated part.
The receiver in the Serdes I implementation was limited to testing pads only, so it did
not support the full 4-to-16 demultiplexer. Instead the sampled data from the transition
detector was fed directly to output pads. No additional circuitry was added to retime the
output data, so the four bits were not presented to the output at the same time.
In order to test the high speed operation of the receiver an on-chip data test source
was created. This circuit generated periodic signals at 10 GHz, 5 GHz, and 2.5 GHz. Two
DC pads, R0 and R1, were used to select between the three data source inputs and an
externally supplied input, and R2 was used as a control voltage for the VCO. The receiver
clock was connected to pad R5, and the output data was connected to pads R8 through R11.
To aid in testing, the capacitor from the charge pump was passed to pad R4 through a high
resistance path. This pad could confirm the proper operation of the charge pump while the
circuit was operating.
Receiver
142
Table 6-1 Pin-out of Serdes I transmitter
Pin I/O Description
R0 DC in test source (SELECT A)
R1 DC in test source (SELECT B)
R2 RF out test source output
R3 DC in control voltage for test source
R4 RF out integrator voltage (capacitor)
R5 RF out receiver clock
R6 Power -2 V (FET charge pump)
R7 RF in receiver input
R8 RF out data 3
R9 RF out data 2
R10 RF out data 1
R11 RF out data 0
143
Figure 6-14 Serdes I receiver layout artwork and photographOn the left is the final artwork for the first receiver design. On the rightis a microphotograph of the fabricated part.
6.8.2. Serdes II
The full chip layout and pin-outs are shown and described in Section 5.9.2. on
page 109.
6.9. Testing Results
6.9.1. Serdes I (receiver test results)
The receiver circuit has a pull -in range of 18.7 to 18.9 Gb/s. This represents the range
of frequencies for which the PLL can acquire lock with the onset of new data. Once lock-
in has occurred, the circuit can maintain lock for its hold-in range of 16.4 to 19.6 Gb/s. This
is an undesirable situation for two important reasons. First, the lock-in range dictates the
artwork fabricated chip
test source
S0
S1
S2
S3
S4
S5
S6
S8
S9
S10
S11
S7
charge pump
clock
transition detector
144
allowable range of data frequencies because the communication system can not be expected
to initialize with a lower bit rate and then ramp up to the nominal bit rate. Second, the hold-
in range did not meet the specification of 20 Gb/s.
The cause of the poor pull- in range is the non-linear nature of the transition detector.
It has a very high gain and saturates above a small phase deviation, limiting the ability to
adjust for phase differences. The low hold-in range is due to the lower then expected
frequency range of the current starving VCO, shown in Fig.3-5 on pa ge27.
Fig. 6-15 shows the receiver locked to data at 19.4 Gb/s. (The oscill oscope is
triggered on the input signal) Fig. 6-15(a) shows a locked condition with data arriving with
20 bits per transition (0.05 tpb) and (b) shows a locked condition with 10 bits transition (0.1
tpb).
When the receiver is locked with data at 0.05 tpb (10 one’s 10 zero’s), an rms phase
ji tter of 2.64 ps is measured and shown in Fig. 6-16. When the number of transitions are
decreased to 0.016 tpb (32 1’s 32 0’s) a jitter value of 8 ps is measured. Results indicate
that a locked condition can be maintained for a data stream with an edge every 300 bits
before the clock jitter becomes too large and lock is lost.
Figure 6-15 Serdes I receiver locked to data.The above plots show the recovered clock and the sampled data for adata rate of 19.4 Gb/s. (a) is fed with data with 20 bits per transition and(b) is fed with 10 bits per transition.
recoveredclock
sampled data
(a) (b)
145
Figure 6-16 Serdes I recovered clock showing jitter.This plot shows a receiver locked to data with a 30% duty cycle. Therecovered clock as an rms jitter of 2.6 ps.
6.9.2. Serdes II (receiver test results)
The results from the second receiver iteration were very similar to the first, as
expected. The big difference was that the receiver integrator had a circuit glitch that
prevented it from operating as an integrator. Instead it operated like a low-pass filter. This
limited the hold-in range to that of the pull-in range which was from 4.20 to 4.63 GHz or
16.8 to18.5 Gb/s. Although this small hold-in range is a problem a more serious concern is
the small pull -in range. The only way to solve this problem is to provide the receiver with
a reference signal very close to the frequency of the data. This solution was evaluated and
simulated in Serdes III .
Fig. 6-17 shows the receiver in lock with the data and the clock at 4.5 GHz. This was
achieved by using an external source running at the same frequency as the clock. The
146
internal source operated correctly with various combinations of frequencies. One included
the internal source VCO running at 3.7 GHz with the divide-by-2 enabled and a clock at
4.63 GHz. This corresponds to data with 5 ones and 5 zeros which also indicates that the
receiver is able to lock on both rising and falli ng data transitions.
Figure 6-17 Serdes II Rx locked to dataThe plot captured from the oscilloscope shows input data and thereceiver clock locked to it. Both are at 4.5 GHz, and the data representsa bit pattern of 1100 at 18 Gb/s.
One way to measure the performance of the receiver is to look at the phase noise of
the recovered clock relative to the transition density [14], [31]. Fig. 6-18 shows four
different phase noise measurements for varying lengths of periodic data streams. The data
was generated with the HP 8563 low phase noise signal source.
The curve for 100 bits represents a series of 50 one’s followed by 50 zero’s. As can
be seen in the plot, the fewer the transitions the higher the phase noise. At 1 MHz, a
transition density of 0.052 yields a phase noise value of -112 dBc/Hz and a density of
0.0064 yields a value of -88 dBc/Hz. As the clock phase noise increases so does the jitter,
clock
data
147
which relates to a larger BER. In the minimum, and likely, worst case of 19 bits, integrating
from 1 MHz to 1 GHz to find the phase noise gives an rms jitter of approximately 2.0 ps.
Figure 6-18 Serdes II receiver clock phase noiseThis plot shows the phase noise for various length bit sequences. Thesequence consists of a string of one’ s followed by a string of zero’s witha period indicated in the plot. As expected, the fewer transitions thelarger the phase noise.
The final test of the receiver involved connecting the output of the transmitter back
into the receiver. This util ized the full potential of the built -in testing circuitry. The first
problem encountered was the inabil ity to feed back a differential signal. This was because
two matched lines from the output of the Tx to the input of the Rx could not be guaranteed.
The probes, connectors, and cables introduce too much variation in length to work properly.
Even a few millim eters could offset the differential signals by a considerable amount. It
was concluded that for differential testing, the part would have to be packaged and placed
on a board.
Because differential testing was out of the question, the system was set up for single-
ended testing. This was done by tying one end of the receiver input to a DC reference
voltage half-way between the high and low transmitter signal levels. This technique
destroyed the benefits of a differential signal and would not operate at either 20 or 10 Gb/s.
-130
-120
-110
-100
-90
-80
-70
0.1 1 10 100
Frequ ency (MHz)
Pha
se N
oise
(dB
c/H
z) 156 bi ts
100 bi ts
76 bi ts
19 bi ts
148
The feed-through pad showed a highly corrupted signal. The single-ended technique and/or
a bandwidth problem in the differential pad receiver prevented a full-test of the feedback
testing scheme.
6.10.Future Work
6.10.1. Sampling offset correction
One attribute of data arriving in a receiver, typically seen in optical systems, is bits
that are skewed toward one transition. This is usually an effect of the non-linear nature of
the light sensitive diode, but can be a result of the transmitter or from the channel i tself. The
ramification is an increase in BER if samples are taken at the exact center of the bit. The
solution is to allow the offset of the data sampling points relative to the data transitions.
6.10.2. 40 Gb/s?
The first step in moving to a 40 Gb/s solution is to utili ze a 10 GHz ring oscill ator.
Given this possibili ty, the next problem is in the design of the receiver ampli fier. This
ampli fier will require at least a 20 GHz bandwidth and must be able to drive a significant
number of loads. It may be necessary to sacrifice phase detection of every transition and
just utili ze every fourth edge to reduce the MS-latch loading effects. This solution still
requires four data latches, plus one transition latch which may still be too high. Another
solution would be to use a bang-bang phase detector that requires a clock and its quadrature
at half the baud rate [26], [32]. This solution requires only four MS-latches.
6.10.3. Demultiplexer improvements
A problem found during the testing of the Serdes II chip was in the 4-to-16
demultiplexer described in Section 6.4. on page 138. Due to stringent timing constraints
and excessive loading, the set of 4 four bit latches were failing to latch the data. Fig. 6-19
depicts an improved demultiplexer that operates in stages. The first stage latches the four
data bits from one of the PLL clock phases. The clock is then divided by two and used to
clock the next stages of eight latches. The clock is then divided again and the data is latched
149
into 16 latches. The final stage realigns all the data edges by latching the 16 bits
simultanously.
Figure 6-19 Revised 4-to-16 demultiplexerIn order to reduce the timing requirements on the demultiplexer the datais demultiplexed in stages. Each stage is successively clocked by a clockof half the frequency from the previous stage.
tran
siti
on
d
etec
tor
ΦΦΦΦ1
da
db
dc
dd
dem
ult
iple
xed
dat
a
200 ps
ΦΦΦΦ1
toggleF/F
toggleF/F x2
2
150
Discussion & Conclusion
In conclusion, three 20 Gb/s communication systems were designed and two were
fabricated in IBM’s SiGe 5 HP process. Each design built on test results from the previous
implementations, and the third, and final design was intended for future research and
development.
The second iteration was a unified transceiver chip possessing a transmitter and a
receiver. It had wirebond pads for wafer probe testing as well as C4 pads for flip-chip
packaging. Through the C4 pads, 16 bits of parallel data could be supplied to and extracted
from the chip. An internal testing circuit enabled complete testing of the chip without the
need for packaging.
The Feed Forward Interpolated VCO, a four stage ring oscill ator that uses novel feed
forwarding techniques, was developed. Its very high frequency nature required the use of
capacitance to slow its frequency down to 5 GHz. Its flexibility makes it an excellent choice
for short-haul communication systems. Phase noise at 1 MHz was measured as -90.5
dBc/Hz which is one of the best numbers quoted for a ring oscill ator at this speed. The
associated jitter is quite small and is an interesting function of the control voltage.
The transmitter in the second prototype had a very wide operating range of 14.27 to
21.58 Gb/s. A time domain sampling oscil loscope measured an rms clock jitter value of 4.3
ps or 0.086 UI. Using a spectrum analyzer, however, rms clock jitter from 100 kHz to 100
MHz was measured at 1.4 ps. The eye diagram was very symmetric, indicating that the
symmetric multiplexer and data interleaving scheme operated as expected.
The second receiver did not have an external reference and, therefore, had only the
high speed data stream to lock to. This limited the pull- in range to 16.8 to 18.5 Gb/s. Clock
ji tter measured from the oscill oscope had an rms value of 2.0 ps. At very low transition
rates of 78 bits per transition, the receiver was still able to maintain lock. This is credited
to the phase detector which is able to use every transition for phase information.
151
A third prototype was developed, but not fabricated, using the data acquired from the
first two designs. The transmitter PLL bandwidth was further optimized and a negative
impedance amplifier loop filter was added. A frequency locked loop was added to the
receiver PLL to greatly enhance the pull -in range. The demultiplexer scheme was also
improved to minimize the timing constraints.
152
References
[1] R. C. Walker, K. Hsieh, T. A. Knotts, and C. Yen, “A 10 Gb/s Si-Bipolar TX/RXChipset for Computer Data Transmission,” IEEE International Solid-State CircuitsConference, pp. 302-303, 1998.
[2] S. A. Steidl, “A 32-Word by 32-Bit Three-Port Bipolar Register File ImplementedUsing a SiGe HBT BiCMOS Technology,” Candidacy document, Rensselaer Poly-technic Institute, Department of Electrical Engineering, May 1999.
[3] P. M. Cambell , H. J. Greub, A. Garg, S.l A. Steidl, S. Carlough, M. Ernest, R. Phil -hower, C. Maier, R. P. Kraft, and J. F. McDonald, “A Very-Wide-Bandwidth DigitalVCO Using Quadrature Frequency Multiplication and Division Implemented inAlGaAs/GaAs HBTs,” Proc. GaAs IC Symp., pp. 311-314, 1995.
[4] A. W. Buchwald, and K. W. Martin, “High-speed voltage-controlled oscill ator withquadrature outputs,” Electronics Letters, vol. 27, no. 4, pp. 309-310, February 1991.
[5] R. Walker, C. Stout, C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data RecoveryIC with Robust Loss of Signal Detection,” IEEE International Solid-State CircuitsConference, pp. 246-247, 1997.
[6] M. Ernest, T. W. Krawczyk, and J. F. McDonald, “Symmetric Multiplexer,” Inven-tion Disclosure Record, Rensselaer Polytechnic Institute, February 2000.
[7] T. W. Krawczyk, and J. F. McDonald, “The Feed Forward Voltage Controlled RingOscill ator,” Invention Disclosure Record, Rensselaer Polytechnic Institute, May2000.
[8] D. C. Ahlgren, G. Freeman, S. Subbanna, R. Groves, D. Greenberg, J. Malinowski,D. Nguyen-Ngoc, S. J. Jeng, K. Stein, K. Schonenberg, D. Kiesling, B. Martin, S.Wu, D. L. Harame, and B. Meyerson, “A SiGe HBT BiCMOS technology for mixedsignal RF applications,” Proceedings of the IEEE Bipolar/BiCMOS Circuits andTechnology Meeting, Minneapolis, MN, pp. 195-197, September 1997.
[9] K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, “95 GHz fTSelf-Aligned Selective Epitaxial SiGe HBT with SMI Electrodes,” IEEE Interna-tional Solid-State Circuits Conference, pp. 312-313, 1998.
[10] L. Larson, M. Case, S. Rosenbaum, D. Rensch, P. MacDonald, M. Matloubian, M.Chen, D. Harame, J. Malinowski, B. Meyerson, M. Gilbert, and S. Mass, “Si/SiGeHBT Technology for Low-Cost Monolithic Microwave Integrated Circuits,” IEEEInternational Solid-State Circuits Conference, pp. 80-81, 1996.
[11] J. R. Long, M. A. Copealand, S. J. Kovacic, D. S. Malhi, and D. L. Harame, “RFAnalog and Digital Circuits in SiGe Technology,” IEEE International Solid-StateCircuits Conference, pp. 82-83, 1996.
[12] K. Ismail , “Si/SiGe CMOS: Can it extend the li fetime of Si,” IEEE InternationalSolid-State Circuits Conference, pp. 116-117, 1997.
153
[13] L. Sun, T. Kwasniewski, and K. Iniewski, “A Quadrature Output Controlled RingOscill ator Based on Three-Stage sub-feedback Loops,” IEEE Internation Sympo-sium on Circuits and Systems, vol. 2, pp 176-179, 1999.
[14] R. Walker, C. Stout, and C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recov-ery IC with Robust Loss of Signal Detection,” IEEE International Solid-State Cir-cuits Conference, pp. 246-247, 1997.
[15] L. Dai, and R. Harjani, “Comparisons and Analysis of Phase Noise in Ring Oscilla-tors,” IEEE International Symposium on Circuits and Systems, pp. 77-80, May2000.
[16] A. Hajimi ri, and Thomas H. Lee, “A General Theory of Phase Noise in ElectricalOscill ators,” IEEE Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179-194, Feb-ruary 1998.
[17] J. A. McNeil, “Jitter in Ring Oscil lators,” IEEE Journal of Solid-State Circuits, vol.32, pp. 870-879, June 1997.
[18] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring Oscill a-tors,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June 1999.
[19] T. H. Lee, and A. Hajimi ri, “Oscill ator Phase Noise: A Tutorial,” IEEE Journal ofSolid-State Circuits, vol. 35, no. 3, pp. 326-335, March 2000.
[20] H. Matsuoka, and T. Tsukahara, “A 5-GHz Frequency-Doubling Quadrature Modu-lator with a Ring-Type Local Oscill ator,” IEEE Journal of Solid-State Circuits, vol.34, pp. 1345-1348, September 1999.
[21] J. Plouchart, H. Ainspan, M. Soyuer, and A. Ruehli , “A Fully-Monolithic SiGe Dif-ferential Voltage-Controlled Oscillator for 5 GHz Wireless Applications,” IEEERadio Frequency Integrated Circuits Symposium, pp. 57-60, 2000.
[22] M. Soyuer, J. N. Joachim, N. Burghartz, H. A. Ainspan, K. A. Jenkins, P. Xiao, A.R. Shahani, M. S. Dolan, and D. L. Harame, “An 11-GHz 3-V SiGe Voltage Con-trolled Oscill ator with Integrated Resonantor,” IEEE Journal of Solid-State Circuits,vol. 32, no. 9, pp. 1451-1454, September 1997.
[23] S. K. Enam and A. A. Abidi, “A 300-MHz Voltage-Controlled Ring Oscil lator,”IEEE Journal of Solid-State Circuits, vol. 25, no. 1, pp. 312-315, February 1990.
[24] S. Lee, B. Kim, and K. Lee, “A Novel High-Speed Ring Oscill ator for MultiphaseClock Generation Using Negative Skewed Delay Scheme,” IEEE Journal of Solid-State Circuits, vol. 32, no. 2, pp. 1451-1454, February 1997.
[25] D. C. Ahlgren, M. Gilbert, D. Greenberg, S. J. Jeng, J. Malinowskil, D. Nguyen-Ngoc, K. Schonenberg, K. Stein, R. Groves, K. Walter, G. Hueckel, D. Colavito, G.Freeman, D. Suderland, D. L. Harame, and B. Meyerson, “Manufacturability dem-onstration of an integrated SiGe HBT technology for the analog and wireless marketplace,“ IEEE International Electron Devices Meeting Technical Digest, San Fran-cisco, CA, December 1996, pp. 859-862.
[26] J. F. Ewan, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker, and H. A. Ainspan,“Single-Chip 1062 Mbaud CMOS Transceiver for Serial Data Communications,”IEEE International Solid-State Circuits Conference, pp. 32-33, 1995.
[27] D. Friedman, M. Meghelli , B. Parker, H. Ainspan, and M. Soyuer, “Sub-picosecondSiGe BiCMOS Transmit and Receive PLLs for 12.5 Gbaud Serial Data Communi-cation,” Symposium on VLSI Circuits, pp. 132-135, 2000.
154
[28] R. Farjad-Rad, C. Yang, M. Horowitz, and T. Lee, “A 0.3-mm CMOS 8-Gb/s 4-PAM Serial Link Transceiver,” IEEE Journal of Solid-State Circuits, vol. 35, no. 5,pp. 757-764, May 2000.
[29] H. Knapp, T. F. Mefster, M. Wurzer, D. Zoschg, K. Aufinger, and L. Treitinger, “A79 GHz Dynamic Frequency Divider in SiGe Bipolar Technology,” IEEE Interna-tional Solid-State Circuits Conference, pp. 208-209, 2000.
[30] M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “SiGe BiCMOS 3.3V Clockand Data Recovery Circuits for 10Gb/s Serial Transmission Systems,” IEEE Inter-national Solid-State Circuits Conference, pp. 56-57, 2000.
[31] Y. M. Greshishchev, and P. Schvan, “SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application,” IEEE Journal of Solid-State Circuits,vol. 35, no. 9, pp. 1353-1359, September 2000.
[32] A. Pottbacker, U. Langmann, and H. Schreiber, “A Si Bipolar Phase and FrequencyDetector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-State Cir-cuits, vol. 27, no. 12, pp. 1747-1751, December 1992.
[33] S. Shioiri, M. Soda, T. Monikawa, T. Hashimoto, F. Sato, and K. Emura, “A 10 Gb/sSiGe Framer/Demultiplexer fo SDH Systems,” IEEE International Solid-State Cir-cuits Conference, pp. 202-203, 1998.
[34] Albert X. Widmer, “Method of Coding to Minimize Delay at a CommunicationNode,” U.S. Patent 4665517, assigned to Internation Business Machines, 1987.
[35] M. Fukaishi, S. Nakamura, A. Tajima, Y. Kinoshita, Y. Suemura, H. Suzuki, T. Itani,H. Miyamoto, N. Henmi, T. Yamazaki, and M. Yotsuyanagi, “A 2.125-Gb/s BiC-MOS Fiber Channel Transmitter for Serial Data Communications,” IEEE Journal ofSolid-State Circuits, vol. 34, no. 9, pp. 1325-1330, September 1999.
[36] Y. M. Greshishchev, and P. Schvan, “A 60-dB Gain, 55-dB Dynamic Range, 10-Gb/s Broad-Band SiGe HBT Limiting Ampli fier,” IEEE Journal of Solid-State Cir-cuits, vol. 34, no. 12, pp. 1914-1920, December 1999.
[37] W. Pöhlmann, “A Sil icon-Bipolar Ampli fier for 10 Gbit/s with 45 dB Gain,” IEEEJournal of Solid-State Circuits, vol. 29, no. 5, pp. 551-556, May 1994.
[38] K. Kawai, and H. Ichino, “A 0.6 W 10 Gb/s SONET/SDH Bit-Error-MonitoringLSI,” IEEE International Solid-State Circuits Conference, pp. 54-55, 2000.
[39] S. Finocchiaro, G. Palmisano, R. Salerno, and C. Sclafani, “Design of Bipolar RingOscill ators,” IEEE International Symposium on Circuits and Systems, vol. 1, pp 5-8,1999.
[40] Y. Chen, S. Koneru, E. Lee, and R. Geiger, “Simulation of Random Jitter in RingOscill ators with SPICE,” IEEE International Symposium on Circuits and Systems,vol. 2, pp 1154-1157, 1997.
[41] Dan H. Wolaver, Phase-Locked Loop Circuit Design., Englewood Cli ffs, NJ: Pren-tice Hall, 1991.
[42] T. Kuroda, T. Fuji ta, Y. Itabashi, S. Kabumoto, M. Noda, and A. Kanuma, “1.65Gb/s 60 mW 4:1 Multiplexer and 1.8 Gb/s 80 mW 1:4 Demultiplexer ICs Using 2V3-Level Series-Gated ECL Circuits,” IEEE International Solid-State Circuits Con-ference, pp. 36-37, 1995.
[43] D. Chen, R. Waldron, “A Single-Chip 266 Mb/s CMOS Transmitter/Receiver forSerial Data Communications,” IEEE International Solid-State Circuits Conference,pp. 100-101, 1993.
155
[44] M. Soyuer, K. A. Jenkins, J. N. Burghartz, H. A. Ainspan, F. J. Canora, S. Ponna-palli , J. F. Ewen, and W. E. Pence, “A 2.4 GHz Sil icon Bipolar Oscill ator with Inte-grated Resonator,” IEEE Journal of Solid-State Circuits, vol. 31, no. 2, pp. 268-270,February 1996.
[45] F. Svelto, S Deantoni, and R. Castello, “A 1.3 GHz Low-Phase Noise Fully TunableCMOS LC VCO,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 356-361,March 2000.
[46] J. J. Kim, and B. Kim, “A Low-Phase-Noise CMOS LC Oscill ator with a RingStructure,” IEEE International Solid-State Circuits Conference, pp. 430-431, 2000.
[47] C. Wu, and H. Kao, “A 1.8 GHz CMOS Quadrature Voltage-Controlled Oscill ator(VCO) Using the Constant-Current LC Ring Oscill ator Structure,” IEEE Interna-tional Symposium on Circuits and Systems, vol. 4, pp 378-381, 1998.
[48] J. Akagi, Y. Kuriyama, M. Asaka, T. Sugiyama, N. Lizuka, K. Tsuda, and M. Obara,“Five AlGaAs/GaAs HBT ICs for a 20 Gb/s Optical Receiver,” IEEE InternationalSolid-State Circuits Conference, pp. 168-169, 1994.
[49] M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzaki, andT. Tashiro, “Si-Analog ICs for 20 Gb/s Optical Receiver,” IEEE International Solid-State Circuits Conference, pp. 170-171, 1994.
[50] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, “A 900 MHz CMOS LC-Oscill ator with Quadrature Outputs,” IEEE International Solid-State Circuits Con-ference, pp. 392-393, 1996.
[51] B. L. Thompson, and H. Lee, “A BiCMOS Receiver/Transmit PLL Pair for SerialData Communications,” IEEE Custom Integrated Circuits Conference, pp. 29.6.1-29.6.5, May 1992.
[52] C. R. Hogge, “A Self Correcting Clock Recovery Circuit,” IEEE Journal of Light-wave Technology, vol. LT-3, no. 6, pp. 1312-1314, December 1985.
[53] D. Y. Wu, A. C. Yen, D. Meeker, S. Beccue, K. Pedrotti, J. Penney, A. Price, and K.C. Wang, “Two Phase Detectors for 2.5-10 Gb/s NRZ Data Operation: a Hogge anda Balanced Mixer,” GaAs IC Symp., pp. 266-269, 1996.
156
Appendix A. IBM SiGe 5 HP
A.1. NPN Vbe characteristics
The SiGe npn transistor Vbe characteristics are important for various reasons. First it
indicates the turn-on voltage of the transistor: the voltage below which the transistor is
considered off. Second, at a given operating collector current it can be used to find the
base-emitter voltage. Third, and perhaps most importantly, is that the derivative of the
transistor’s Vbe with respect to the collector current, Ic, is the transconductance. This
parameter is found in Fig. A-2 by taking the slope at half the peak f T current. This current
flows through an optimized differential pair when both inputs are biased identically.
Figure A-1 Ic-Vbe characteristics for npn transistorThe above plot shows the collector current at a fixed V ce of 2 V versusVbe. The analytical approximation is accurate up to the operating pointof 0.7 mA/µm.
-20-18-16-14-12-10-8-6-4-20246
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Vbe (V)
Nor
mal
ized
Ic (
ln(m
A/u
m))
Simulated
Analytical
157
Figure A-2 NPN transconductanceThe transconductance is the point where the collector current is half themaximum fT current.
Comparing the simulated transconductance to that found in
yields a a fudge factor, γ, of 1.65.
The simulated plot in Fig. A-1 is found from
where Is is graphically determined to be 30 fA.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.87 0.88 0.89 0.9 0.91 0.92 0.93
Vbe (V)
Nor
mal
ized
Ic (
mA
/m
)
120 ΩΩΩΩ//// µµµµm8.33m/ΩΩΩΩ//// µµµµm
gm1γ---
ie
vT
-----= re γvT
ie
-----= (A-1)
Ic Ise
Vbe
VT
-------
= (A-2)
158
A.2. NPN Ic versus Vce characteristics
Figure A-3 Ic-Vce characteristics for npn transistorThe above plot shows the collector current response versuscollector-emitter voltage for different base currents. Breakdown occursat a Vce of 3 V.
The Ic versus Vce characteristics of the npn transistor reveal important design
parameters. The first is a breakdown voltage of 3 V which is the maximum voltage that can
be applied across the collector-emitter junction. Above this voltage the base current loses
control over the collector current and large amounts of current begin to flow. The Early
voltage, the voltage at which all backwards linear extrapolations of the curves meet, is
about 45 V. This parameter is related to the output resistance looking into the collector by
where Ic is the collector current near the active region. The normalized value of ro is 80
kΩ-µm.
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
0 1 2 3 4 5
Col lector-Emitter Voltage (V)
Col
lect
or C
urre
nt (
mA
/m
)
0 µµµµA/µµµµm
50 µµµµA/µµµµm
100 µµµµA/µµµµm
150 µµµµA/µµµµm
200 µµµµA/µµµµm
250 µµµµA/µµµµm
ro
VA
Ic
------= (A-3)
159
A.3. NPN fT Curves
Figure A-4 fT vs Ic characteristics for npn transistorThe maximum transition frequency for the SiGe npn transistors occursat approximately 0.8 mA/µm. Above that current the fT drops offrapidly and that range should be avoided during design.
The most important design parameter found in the fT curves in Fig. A-4 is the DC
collector current bias point for maximum operating frequency. Although this normalized
current increases slightly as larger transistors are used, a value of 0.8 mA/µm is reasonable
for all sizes. Also worth noting in this plot, is the fact that as larger transistors are used, and
thus more power is supplied, the faster the transistors operate. The smallest transistor has a
peak fT of approximately 50 GHz and the largest transistor peaks at 62 GHz.
0
10
20
30
40
50
60
70
0.001 0.01 0.1 1 10
Normalized Collector Current (mA/µµµµ m)
Fre
quen
cy (
GH
z)
1 um
2.5 um
5 um
10 um
20 um
160
Appendix B. CML Logic Gates
B.1. CML Voltage Swing (non-linearized, digital)
The CML voltage swing is found by analyzing the collector current flow through
each of the two transistors in a differential pair with a DC differential voltage on the inputs.
The voltage swing must be large enough to ensure that the majority of current flows
through only one transistor. Fig. B-1 depicts how the current flow shifts from one transistor
to the other as the differential voltage changes. At about ±200 mV, at least 99% of the
current is flowing through one leg of the CML buffer. This is the assigned minimum
operating voltage swing and a more conservative 250 mV or greater was used throughout
this project.
Figure B-1 Current switching versus differential input voltageThe input to a differential pair controls the switching of current throughtwo branches. A critical current level must be reached to assure that thedigital gate has completely switched. For a 99.7% current level throughone branch, a minimum of 250 mV must be applied.
B.2. CML Signals
CML circuits posess important attributes called signal levels, which are necessary to
connect multiple gates together. The need to merge multiple differential pairs arises from
0.001%
0.010%
0.100%
1.000%
10.000%
100.000%
-300 -250 -200 -150 -100 -50 0
Differential Voltage (mV)
Per
cent
age
of to
tal c
urre
nt lo
g(%
)
161
the small , but desirable voltage swing (Appendix B.1.), the large base to emitter voltage
(Appendix A.1.), and the technique used. Merging pairs together involves stacking them so
that current through one is a function of the state of another. In this way, different current
paths can be connected to the pull -up resistors, the output. Other techniques exist for
combining differential pairs, see Section5.3.1. on pa ge75, but they are not by themselves
considered CML.
Figure B-2 Simple AND CML GateThis gate shows how multiple differential pairs can be merged toproduce a two level gate.
In Fig. B-2, the differential input a must be of higher potential, specifically one Vbe
higher, then input b, to ensure that transistor Q1 will not become saturated. Input a is said
to be on level 1 (0 mV, -250 mV) and b is said to be on level 2 (-900 mV, -1150 mV). A
supply voltage as low at -3.2 V allows up to three levels of inputs.
Level 1 outputs, x, are found at the bottom of the pull -up or collector resistors at the
top of the tree. Level 2, y, and 3, z, outputs are generated from emitter followers and a
diode.
The size of pull -up resistors r1, and r2 is based upon the current source, to produce a
nominal voltage swing of at least 250 mV. For 1 µm sized transistors biased at a current of
0.8 mA, the resistors are set to 400 Ω. In general the normalized resistor value is 400 Ω-µm.
B.3. Voltage Reference
All CML gates require a current source to fix the current flow through the differential
pair switch. The simplest approach, a passive source, places a resistor at the bottom of the
a0 a1
b1b0 Q1
y0
z0
y1
z1
x0
x1
r1 r2
162
tree which has a nearly constant voltage across it and is dependent only on the lowest
transistor pair. This technique has high common mode gain on the lowest differential pair
and often requires a large resistor.
Figure B-3 Reference Voltage GeneratorActive current sources configured in a current mirror require a referencevoltage to control the amount.
A more common approach is to use an active current source implemented as a current
mirror. Fig. B-3 shows the generating circuit producing a mirror current of 0.75 mA/µm.
This current was chosen based upon the current necessary to achieve the maximum
operating frequency of the transistors. See Appendi xA.2.
The emitter degeneracy resistor typically has 0.4 V across it and is used to control
currents which are smaller or larger than the mirror current. For instance, if a 4 µm
transistor circuit requires 3.0 mA, then a 100 Ω emitter resistor will be used.
Transistor Q2 is used for base current compensation and supplies the base current to
all connected circuits. It allows a larger number of sources to be used and prevents current
degradation when adding sources.
The value of R1 is dependent on the supply voltage of the circuits. Designs with
different supplies need only change this resistor to ensure a fixed current throughout all .
B.4. Buffer with emitter follower outputs
A buffer accepts a single input and duplicates it on its output. Its many uses include:
impedance conversion (high input impedance and low output impedance), fixed delay
introduction, and level shifting. Buffers also form the foundation for more complicated
circuits.
1.5
mA
Vee
Vcc
Vref
2x
2x
200Ω ΩΩΩ
R1
Vee-4.5 V-3.2 V
R11.73 kΩΩΩΩ0.87 kΩΩΩΩ
Q2
Q11x
Vee
400Ω ΩΩΩ
0.75
mA
163
The circuit in Fig. B-4 can accept input, a, on levels 1, 2, or 3, since it has only one
differential pair. Level 1 output, x, is taken from the bottom of the pull -up resistors, and
level 2 output, y, is taken from the output of the emitter follower.
Figure B-4 CML Buffer with emitter followersA basic buffer with level 1 and level 2 outputs. It can accept input andany level.
The emitter follower output provides a much higher driving abil ity than the level 1
output. This is because the driving current from the level 1 output is passively pulled-up
through the resistors, and actively pulled-down through the differential pair. As more loads
are added, the base current from each must be supplied through the passive resistors, which
causes a voltage drop and limits the voltage swing. The passive pull -up through the
resistors also limits the speed of the gate. The emitter followers, on the other hand, provide
a high impedance output through β ampli fication of current through transistors, Q1, and Q2.
In this case, the output is actively pulled-up through the follower transistor and actively
pulled,down through the current source.
a0 a1 y0
x0x1
y1
Vee
Vcc
Q1 Q2
164
Appendix C. CML Circuit Details
C.1. Linearizing the differential amplifier
The differential ampli fier is very effective in digital circuits because of its high
voltage gain. For analog circuits, where a linear response is needed, this gain must be
reduced to meet specifications. The preferred method for doing so is to include emitter
resistors to augment the emitter resistance, re, already present in the transistor.
Figure C-1 Linearizing the differential amplifier with emitter resistorsThe addition of emitter resistors augments the output resistance of thedifferential pair transistors and decreases the total gain of the circuit.
The emitter resistance is defined as the resistance from the base to the emitter looking
into the emitter, and it is the inverse of the transconductance, gm. The normalized value
found through simulation in Appendix A.1. is about 120 Ω-µm. The inverse of the sum of
this value and the emitter resistor Re yields the gain
of the circuit with output current and input voltage. In order to find the total voltage gain
Ad must be multiplied by the collector resistance Rc.
A plot of currents, i0 and i1, versus differential input voltage, a0, and a1 is shown in
Fig. C-2. The plot with 0 Ω-µm represents the nominal transfer function for a digital gate.
The gain is high and an input voltage of 100 mV ensures a nearly complete switch of
current. For digital circuits, this allows for a high noise margin, and fast switching
i1 i0
ReRe
a0 a1
Rc Rc
Ad1
re Re+-----------------≈ re
VT
Ie
------ 1gm
------= = (C-1)
165
characteristics. For analog circuits, on the other hand, the active, linear region of the curve
is very small: ±50 mV. It is clear that the addition of the emitter resistors is crucial in
reducing the gain and spreading out the linear region. The choice of resistor wil l be
determined by the output range needed and the gain at an input of 0 V.
Figure C-2 Branch current response for various emitter resistorsThis plot shows the transfer of current from one branch to the otherwhen the differential inputs are changed. Each pair of curves has a fixedemitter resistor
A comparison between (C-1) and the simulated results is plotted in Fig. C-3 and
shows a very good match.
Figure C-3 Simulated / Analytical Gain(C-1) follows the simulated results for the transconductance of a CMLbuffer with emitter resistors shown here.
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
-0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40
Differential Voltage (V)
Bra
nch
Cur
rent
i0
,i1
(mA
/m
)
0ΩΩΩΩ−−−−µµµµm 200ΩΩΩΩ−−−−µµµµ m 400ΩΩΩΩ−−−−µµµµm
600ΩΩΩΩ−−−−µµµµ m800ΩΩΩΩ−−−−µµµµm
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 200 400 600 800
Normalized Re (ΩΩΩΩ -µµµµm)
Inve
rse
Gai
n (V
/mA
-m
)
2.5
1.25
1.66
1
5 Gai
n (m
A/V
/m
)
166
C.2. Current bypassing
In some situations it may be necessary to limit the extent of current switching in a
differential ampli fier. For example, the FFI VCO requires a minimum current flow through
both branches, no matter the input. The solution is to include a bypass resistor which
ensures that some constant current flows in addition to the current defined by the
differential transistor pair.
Figure C-4 Limiting full current switching with bypass resistorsThe addition of bypass resistors allows some current to always flowaround the differential pair. This prevents a complete switching ofcurrent.
Two behavoirs result with the addition of the bypass resistor. First, a full switch of
current through the tree is prevented, which is a desired result. Second, there is a relative
decrease in the gain of the circuit, because of the decrease in collector current which
negatively affects the transconductance. Each of these effects is modeled in this section and
compared to simulation results. In addition, two equations which can be used as design
tools when specifications on gain, and current range are provided.
The maximum current in a branch is a function of the total current, the bypass and
emitter resistors, and the input voltage. Starting with the assumption that branch 1 has zero
emitter current, i.e. a0 is much higher then a1. The currents through each bypass resistor are
the same. It is assumed that there is a differential pair above this one with emitter voltages
i0
a0 a1Rb
Re Re
i1
Rb
167
at the same potential. We define equations
where Io is the total current through the tree, vd is the differential input voltage and vo is the
voltage across the bypass resistor. The value for vbe is found in Fig. A-2 on page157.
Solving for the current through branch 0 yields
Fig. C-5 shows the analytical and simulated results for the maximum current as a fraction
of the total current for emitter resistors of value 0 Ω−µm and 400 Ω−µm, and a differential
input of 400 mV.
With large bypass resistor values, the circuit allows almost a full current switch
because less current is bypassed around the differential pair. Values below about 10 kΩ-µm
produce a much larger reduction down to about 3 kΩ-µm when Rb is too small and no
current switching takes place.
Io ie1 2ib+=
ie1
vo
vd
2----- vbe–+
Re------------------------------=
ib
vo
Rb------=
(C-2)
(C-3)
(C-4)
Imax Io ib–Io Re Rb+( ) vbe
vd
2-----–
–
Rb 2Re+----------------------------------------------------------- id max,= = =
Imax Re 0=Io
vbe
vd
2-----–
Rb------------------------.–=
(C-5)
(C-6)
168
Figure C-5 Current limiting effects of bypass resistorThe bypass resistor prevents current from being completely shut off in adifferential branch. The maximum current allowed to flow divided bythe total current is called the maximum current fraction.
The next step is to examine how the gain is affected by the addition of the bypass
resistor. The primary factor in the decrease in the transconductance is because of the
decrease in collector current in the differential pair. Gain is directly related to
transconductance and emitter resistance. A second order effect results from an increase of
voltage, and current, across the bypass resistor when collector current increases through the
emitter resistor.
Solving for the gain can be broken up into separate pieces: how the emitter current
changes relative to the input voltage, and how the total current changes relative to the
emitter current.
shows this relationship. The next step is to solve for the bypass current relative to the
emitter current
0.5
0.6
0.7
0.8
0.9
1.0
0 5 10 15 20 25 30 35 40
Bypass Resistor (kΩΩΩΩ -µµµµm)
Max
imum
Cur
rent
Fra
ctio
n
Simulated Re=0
Analytical Re=0
Simulated Re=400
Analytical Re=400
400 ΩΩΩΩ -µµµµm
0 ΩΩΩΩ -µµµµm
Vd=400 mV
didv------
die
dv------- di
die
-------⋅= (C-7)
dib
die
------- ddie
-------vbe ie1Re+
Rb---------------------------
Re
Rb------.= = (C-8)
169
Since the sum of the bypass current and the emitter current is the total current i, then
it is possible to find the total current relative to the emitter current
Next, the emitter current relative to the other parameters is determined
From (C-1) on page164 the derivative of emitter current to input voltage is the inverse of
sum of the emitter resistances, and (A-1) on page 157 yields the transconductance. Using
(C-7), (C-9), and
and simplifying the equation yields the desired result
where id and vd are the differential current and differential voltage, respectively. Results
from this analysis compared to simulated results are shown in Fig. C-6.
The top plot in Fig. C-6 shows an upward slope as Rb is increased and increases the
transconductance. The lower plot shows a very flat response because the gain, in this case,
is fixed by the emitter resistor and is not affected by the collector current. (see Appendix
C.1. on page164).
didie
-------dib
die
-------die
die
-------+ReRb------ 1+= = (C-9)
ie
Rb Io 2vbe–
2Re 2Rb+----------------------------.= (C-10)
die
dv------- 1
re Re+----------------- 1
γvT
2Re 2Rb+
Rb Io 2vbe–---------------------------- Re+
--------------------------------------------------= = (C-11)
didv------
did
dvd
-------- 12γvTRb
Rb Io 2vbe–---------------------------- Re Rb
||+
-----------------------------------------------------= = (C-12)
170
Figure C-6 Current gain effects of bypass resistorThe bypass resistor lowers the current through the differential pair,which in turn decreases the transconductance, subsequently decreasingthe gain.
Fig. C-7 is a surface plot showing the relationship between current gain and emitter
and bypass resistors. This can be useful when designing a linearized differential amplifier
with bypass resistors.
Figure C-7 Designing for gain with emitter and bypass resistorsThis plot is useful for designing with bypass resistors when gain isspecified.
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
0 5 10 15 20 25 30 35 40
Bypass Resistor (kΩΩΩΩ -µµµµm)
Gai
n (m
A/V
/m
)
Simulated Re=0 ohm-um
Analytica Re=0 ohm-um
Simulated Re=400 ohm-um
Analytical Re=400 ohm-um
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 400
100
200
300
400
500
600
700
800
Gain (mA/V/µµµµm)
Bypass Resistor (kΩΩΩΩ-µµµµm)
Em
itter
Res
isto
r (
-m
)
0-1 1-2 2-3
3-4 4-5 5-6
6-7 7-8 8-9
1
2
3
4
171
C.3. Increasing CML delay
It is sometimes necessary to increase the delay of a CML gate to meet certain timing
requirements. Such a need is found in a ring oscillator that must be centered at a frequency
that is lower than the free running frequency. The addition of a capacitor across the level 1
outputs degrades the rise time, and thus, increases the gate delay. This solution is easy to
implement and simple to model.
Figure C-8 Collector CapacitorA collector capacitor can be used to degrade the delay through a CMLgate by increasing the rise time.
Modeling the new gate delay first involves determining the gate delay without the
capacitor. This nominal delay is represented by To, and is approximately equal to 12.5 ps.
The extra delay is modeled as a RC charging circuit with a time constant of 2RcCc. The
factor of 2 arises from the equivalent circuit shown in Fig. C-8, where two series capacitors
have a value of twice the original. An additional factor of ln(2) multiplies the time constant
to account for the point at which the output is considered switched. This point is
which is approximately when the differential voltage is 0 V. The total delay is equal to
2Cc
Rc RcRc
Cc
vo To I+ oRce
t–RcCc-----------------
= (C-13)
T To 2( ) 2RcCc( ).ln+= (C-14)
172
Figure C-9 Delay Model with Collector CapacitorThe delay of a CML gate versus level 1 capacitance is derived in thissection and is consistent with simulated results.
0
50
100
150
200
250
300
350
0 100 200 300 400 500 600
Capacitance (fF/µµµµm)
Gat
e D
elay
(ps
)
Analytical
Simulated
173
Appendix D. Sizing Transistors to Minimize VCO Delay
The design of digital logic gates in SiGe technology always includes a consideration
of transistor size. Sizes range from an emitter length of 1 µm to a length of 20 µm and if
multiple fingered emitters are used, effective lengths up to 40 µm. Usually, the larger
transistors have smaller delay, but consume proportionally higher current. A trade-off
decision among power, layout space, and delay specifications needs to be made.
Logic gates can be extremely varied and may include such functions as multiplexed
XOR, and five input AND/OR cells. Delays through each of these wil l depend on the
number of inputs and outputs, the input and output levels and various other factors. An
in-depth analysis of all these factors would be very complicated, and the results diff icult to
utilize. A more general solution, and the one followed in this appendix, is to consider
simple buffers with emitter followers driving other buffers. Although not a completely
accurate representation of most logic gates, the analysis conclusions are very useful in the
design of all gates. If a buffer is driving multiple receivers, this condition is reduced to a
case with only one receiver whose size is equal to the sum of the receivers. For instance, if
a driver has four 1 µm loads, they can be treated as one 4 µm load.
Also worth noting, is that the following analysis is extremely useful in the
optimization of ring oscil lators. These circuits incorporate a ring of two or more buffers that
oscill ate because of an odd number of inversions, and are very sensitive to gate delays. If a
buffer has a delay of 25 ps, then a 1-2 ps difference in delay can have a 4% or greater impact
on the final oscillation frequency. Consideration of the type of loads that wil l be driven by
the VCO is also important when choosing device sizes. For instance, if the VCO has buffers
with 1 µm devices, then a 1 µm load on each stage wil l introduce a proportionally huge
loading effect on the system.
The assumption in this analysis is that the receiver circuit is fixed and design work
will be done on the driver. The data presented here, however, can be useful for the design
of the receiver as well .
174
Figure D-1 Delay from emitter follow to differential amplifierIn general the larger the emitter foll ow the more capable it is at drivinglarger differential amplifiers. A rule of thumb in designing an emitterfoll ower to minimize delay and not use considerable power is to use 2µm devices plus 1 µm per 5 µm of load.
Fig. D-1 shows the effect on the delay of using different sized emitter followers to
drive various receiver loads. The larger the emitter follower, the smaller the delay since the
higher powered follower has a lower output resistance. This, coupled with the receiver
input base capacitance, produces a smaller delay. The figure also shows the acceleration in
delay as the receiver size remains fixed and the emitter follower shrinks. The acceleration
occurs because delay is inversely proportional to output resistance.
Also shown on this plot, are design points which establish a good rule of thumb for
designing emitter followers based on receiver loading for less criti cal gates. Obviously, the
largest emitter followers used will yield the smallest delay, but there is a point were larger
devices do not yield substantial improvement. The design rule is to use followers of at least
2 µm and add an additional 1 µm per 5 µm of load. Following this rule yields very small
delays without huge power consumption
1 2 3 4 5 6 7 8 9 10
13
57
9
Delay (ps)
Emitter Fol lower Size (µµµµm)
Rec
eive
r S
ize
(m
)
10-12 12-14 14-16 16-18 18-20
20-22 22-24 24-26 26-28 28-30
Design Points
emitter follower
delay
amp
175
.
Figure D-2 Delay from differential amp to emitter followerDesigning CML logic gates often requires designing an emitter followerstage. The choice of follower is based on many factors, including thespecific differential amplifier driving the followers. In general, thelarger the follower, compared to the ampli fier, the larger the delaythrough the gate.
After choosing an emitter follower, the next step is to design the differential amplifier
that represents the core of the driver. Fig. D-2 shows the delay from the amplifier to the
emitter follower, given different sizes of each. Here the effect is opposite from the effect
demonstrated in the previous section; a larger follower size now increases the delay. This
is because the followers are now acting as loads on the amplif ier and the larger transistors
add base capacitance. The ideal situation would be to have the smallest emitter followers
possible, but this is not an option after considering loading effects. A good rule is to use an
ampli fier that is at least half the size of the emitter followers. This yields good delay and
driving properties.
From Fig. D-1 and Fig. D-2, it is clear that a trade-off exists when designing an
emitter follower to be placed between two differential ampli fiers. An increase in follower
size allows for a better abilit y to drive loads, however, this increase inhibits the ability of
the first ampli fier to drive the follower. A closer look at this situation yields Fig. D-3, which
1 2 3 4 5 6 7 8 9 10
12
34
56
78
910
Emii ter Fol lower Size (µµµµm)
Am
p S
ize
(m
)
7-8 8-9 9-10 10-11 11-12
12-13 13-14 14-15 15-16Design Points
emitter follower
delay
amp
176
shows the optimum follower size to use, given a driver and receiver ampli fier size. For
instance, in a ring oscil lator with 2 µm buffers each driving a 1 µm load, the optimal
follower to use is about 6 µm in size. From Fig. D-4 we find that the delay through the gate
will be about 23 ps.
Figure D-3 Size of emitter follower between driver and receiverWhen a gate needs to drive another gate on level 2 or lower, or when thereceiver is a large load, emitter followers are used. The optimaltransistor size to minimize delay through the driver and receiver gates,is a function of the transistor sizes in the driver and the receiver.
Ring oscillators typically have a buffer of size x driving the next buffer, and a load.
Minimizing and balancing the external loading on each buffer forces each stage to have 1
µm buffers hanging on it. For standard ring VCOs, an emitter follower design line exists.
This is shown on Fig. D-3 and Fig. D-4. For the feed forward VCO, each stage of size x
must drive two inputs of size x, yielding a different design curve.
The final step is to justify the use of the emitter follower. Since it adds delay to the
buffer-follower-buffer system, it may be better (less delay) to remove the follower
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
Receiver (µµµµm)
Driv
er (
m)
18-20
16-18
14-16
12-14
10-12
8-10
6-8
4-6
2-4
4
6
8
10 12
Ring VCO design points
Feed Forward VCO design points
177
completely. Fig. D-5 shows the difference in delay between a system with and without an
emitter follower. In almost all instances it is beneficial to include the follower unless the
receiver is much smaller then the driver.
Figure D-4 Delay when using optimized emitter followerThe plot above shows the minimum delay achievable between twodifferential amplifiers when using an optimized emitter follower.
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
Receiver (µµµµm)
Driv
er (
m)
34-35
33-34
32-33
31-32
30-31
29-30
28-29
27-2826-27
25-26
24-25
23-24
22-23
21-22
20-21
20
21
22
23
24
25
Ring VCO design points
Feed Forward VCO design points
178
Figure D-5 Delay difference between circuit with follower and circuit withoutAn emitter follower between differential amplifier introduces additionaldelay, but in most cases reduces the overall delay of the system. Only incases with large drivers and smaller receivers does the emitter foll owerincrease the delay.
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
Receiver (µµµµm)D
river
(m
)
8.0-10.0
6.0-8.0
4.0-6.0
2.0-4.0
0.0-2.0 -2.0-0.0
179
Appendix E. SpectreHDL models
E.1. FFI VCO
// Spectre AHDL for FFI VCO 4u, ahdl//// This cell emulates the functioning of the FFI VCO.// It has 4 sine wave outputs each offset from each other// by 45 degrees. Additional outputs give the instantaneous // frequency and the phase relative to a fixed frequency// source//// Thomas Krawczyk 7/00//#define PI 3.1415926535module b_ffi5 ( w20, w21, x20, x21, y20, y21, z20, z21, Vref, s30, s31) (fc,offset,divider,mfreq)
node [V,I] w20; node [V,I] w21;node [V,I] x20; node [V,I] x21;node [V,I] y20; node [V,I] y21;node [V,I] z20; node [V,I] z21;node [V,I] s30; node [V,I] s31;node [V,I] phase; node [V,I] freq;node [V,I] Vref;
// Center frequency with 0 control voltageparameter real fc = 5.96G ;// DC voltage offset on terminal outputsparameter real offset = -1.1 ;// In PLL encorporate 1/8, 1/16 divider into modelparameter real divider = 1 from (0.25:64);// Frequency with which to compare and determine phase offsetparameter real mfreq = 5 GHz;
table VCOdata;real control_voltage, f;real s[11], factor[11];
initial // Mapping data between input control voltage and output frequency collected// from simulation. Must be positive so a 450m offset is introduced.
s[0] = 0.500; factor[0] = 0.733; s[1] = 0.600; factor[1] = 0.733; s[2] = 0.700; factor[2] = 0.747; s[3] = 0.800; factor[3] = 0.805;s[4] = 0.850; factor[4] = 0.849; s[5] = 0.900; factor[5] = 0.896;s[6] = 0.950; factor[6] = 0.950; s[7] = 1.000; factor[7] = 1.000;s[8] = 1.050; factor[8] = 1.046; s[9] = 1.100; factor[9] = 1.091;s[10]= 1.150; factor[10]= 1.134; s[11]= 1.200; factor[11]= 1.168;s[12]= 1.300; factor[12]= 1.218; s[13]= 1.400; factor[13]= 1.230;s[14]= 1.500; factor[14]= 1.230;VCOdata = $build_table(2, factor, s, 11);
analog control_voltage = V(s31,s30) + 450m;// Find the frequency multiplier from the control voltagef = $interpolate(VCOdata, control_voltage);// Find the phase of the w20 phaseph = 2*PI*integ(fc*f/divider,0);// Find the phase of the signal whose frequency is being used for phase differencemph= 2*PI*integ(mfreq,0);// Generate the signals for each phase outputV(w20) <- offset + sin(2*PI* integ(fc*f/divider,0) );V(w21) <- offset - sin(2*PI* integ(fc*f/divider,0) );V(x20) <- offset + sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );V(x21) <- offset - sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );V(y20) <- offset + sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );V(y21) <- offset - sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );V(z20) <- offset + sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );V(z21) <- offset - sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );
180
// Return the phase difference in degreesV(phase) <- (ph-mph)/PI*180;// Return the exact frequency in GHzV(freq) <- fc*f/divider/1G;
E.2. 3-State PD
// Spectre AHDL for SERDES3, PD_3state, ahdl// // This module emulates the 3-state Phase Detector. // It looks for rising transtions of the vi and vo inputs// and forces the output to a +1 or -1 state depending on// which input went high. When both eventually go high the// output is reset. The slip outputs although not implemented// give a pulse when the detector exceeds is max value.// // Thomas Krawczyk 9/27/00//
module PD_3state ( vd0, vd1, vi_slip10, vi_slip11, vo_slip10, vo_slip11, Vref1, Vref2, vi20, vi21, vo20, vo21) ()
node [V,I] vd0; node [V,I] vd1; node [V,I] vi_slip10; node [V,I] vi_slip11; node [V,I] vo_slip10; node [V,I] vo_slip11; node [V,I] Vref1; // Can ignore node [V,I] Vref2; // Can ignore node [V,I] vi20; node [V,I] vi21; node [V,I] vo20; node [V,I] vo21; real vo_center = -1.07; // Center output voltage real vo_swing = 144m; // Swing either high or low real i_rise = -1; // 0 = low 1 = transition 2 = high real o_rise = -1; real out0, out1; analog // Make sure we get a time point at the input crossings. $threshold( V(vi20)-V(vi21), 1 ); $threshold( V(vo20)-V(vo21), 1 ); if( V(vi20) > V(vi21)) if( i_rise < 2 ) i_rise++; else i_rise = 0; if( V(vo20) > V(vo21)) if( o_rise < 2 ) o_rise++; else o_rise = 0; // input vi positive transition? if( i_rise == 1 && o_rise == 0 ) out0 = vo_center + vo_swing; out1 = vo_center - vo_swing; // input vo position transition? if( i_rise == 0 && o_rise == 1 ) out0 = vo_center - vo_swing; out1 = vo_center + vo_swing; // Both transitions detected // reset output back to nominal values if( i_rise >= 1 && o_rise >= 1 ) out0 = out1 = vo_center; if( i_rise == -1 && o_rise == -1 ) out0 = out1 = vo_center;
181
// Give the output signals a rise time and 3 gate delays V(vd0) <- $transition( out0, 60p, 20p, 20p ); V(vd1) <- $transition( out1, 60p, 20p, 20p ); // Frequency slip detectors are not implemented V(vi_slip10) <- -1.5; V(vi_slip11) <- -1.5; V(vo_slip10) <- -1.5; V(vo_slip11) <- -1.5;
E.3. Transition Detector PD
// Spectre AHDL for SERDES3, RxEdgeExtraction, ahdl// This is a model for the Transistion Phase Detector circuit. // Clock inputs are w2 - z2.// Data inputs are dw1 - dz1.// Sampled outputs are da2 - dd2.// Fast and slow commands to the VCO are f20 and s21.// // Each region is 25 ps wide.// // \2|1/ // 3 \|/ 0// ---+---// 4 /|\ 7// /5|6\
module RxEdgeExtraction ( da20, da21, db20, db21, dc20, dc21, dd20, dd21, f20, s21, dw10, dw11, dx10, dx11, dy10, dy11, dz10, dz11, w20, w21, x20, x21, y20, y21, z20, z21, region) ()
node [V,I] da20; node [V,I] da21; node [V,I] db20; node [V,I] db21; node [V,I] dc20; node [V,I] dc21; node [V,I] dd20; node [V,I] dd21; node [V,I] f20; node [V,I] s21; node [V,I] dw10; node [V,I] dw11; node [V,I] dx10; node [V,I] dx11; node [V,I] dy10; node [V,I] dy11; node [V,I] dz10; node [V,I] dz11; node [V,I] w20; node [V,I] w21; node [V,I] x20; node [V,I] x21; node [V,I] y20; node [V,I] y21; node [V,I] z20; node [V,I] z21; node [V,I] region; // AHDL output of the current sampling region integer reg = 0; // 1-8 (0-45 = 0) integer out[8]; // output array of detected transitions // per region to be summed at end integer sum; // sum of output array integer i; // index for summing loop integer da, db, dc, dd; // Sampled outputs (0,1) map to (-1, 1) integer data_val; // Last data value real out_center = -1; // center of fast/slow output real out_diff = 4m; // fast/slow differential output / edge real data_center = -1.1;// Center of sampled data output real data_amp = 150m; // Amplitude of sampled data output analog if( V(w20) > V(w21) && reg == 7 ) if( V(dw10) > V(dw11) ) da = 1; else da = -1; reg = 0; out[reg] = 0; if( V(x20) > V(x21) && reg == 0 ) reg = 1; out[reg] = 0; if( V(y20) > V(y21) && reg == 1 ) if( V(dw10) > V(dw11) ) db = 1; else db = -1; reg = 2; out[reg] = 0; if( V(z20) > V(z21) && reg == 2 ) reg = 3; out[reg] = 0;
182
if( V(w20) < V(w21) && reg == 3 ) if( V(dw10) > V(dw11) ) dc = 1; else dc = -1; reg = 4; out[reg] = 0; if( V(x20) < V(x21) && reg == 4 ) reg = 5; out[reg] = 0; if( V(y20) < V(y21) && reg == 5 ) if( V(dw10) > V(dw11) ) dd = 1; else dd = -1; reg = 6; out[reg] = 0; if( V(z20) < V(z21) && reg == 6 ) reg = 7; out[reg] = 0; // Look for transitions and insert // 1 into output array of current region if( (V(dw10) > V(dw11)) && data_val == 0 ) out[reg] = 1; data_val = 1; if( (V(dw10) < V(dw11)) && data_val == 1 ) out[reg] = 1; data_val = 0; // Sum the fast/slow regions sum = -out[0]+out[1]-out[2]+out[3]-out[4]+out[5]-out[6]+out[7]; V(da20) <- data_center + da*data_amp; V(da21) <- data_center - da*data_amp; V(db20) <- data_center + db*data_amp; V(db21) <- data_center - db*data_amp; V(dc20) <- data_center + dc*data_amp; V(dc21) <- data_center - dc*data_amp; V(dd20) <- data_center + dd*data_amp; V(dd21) <- data_center - dd*data_amp;
V(f20) <- $transition(out_center + out_diff/2*sum, 50p, 20p, 20p); V(s21) <- $transition(out_center - out_diff/2*sum, 50p, 20p, 20p); V(region) <- reg;
E.4. Histogram generator
// Spectre AHDL for SERDES3, histogram, ahdl// This cell allows the plotting of a histogram of voltages.// It samples the "vin" signal and places it in one of "bins" bins.// The "sweep" output signal sweeps across all bins while the "plot"// output shows the current value of that bin. // To create the histogram simply set "sweep" as the x axis and // "plot" as the y axis.// // Thomas Krawczyk 9/26/00// module histogram ( plot, sweep, vin, mean, rms) ( bins, low_v, high_v, begin ) node [V,I] plot; node [V,I] sweep; node [V,I] vin; node [V,I] mean; node [V,I] rms; parameter real bins = 16 from (1:1025); parameter real low_v = 0; parameter real high_v = 1; parameter real begin = 1n from (0:inf); integer bin[1024]; integer index; integer s=0; // Current sweep index integer count=0; // Total samples real range; // Difference between low_v and high_v real mu, sigma; // Mean and standard deviation
183
real sum, sq_sum;// The sum and the sum of square samples initial range = high_v-low_v; analog if( $time() > begin ) count++; sum += V(vin); sq_sum += V(vin)*V(vin); mu = sum/(1.0*count); sigma = sqrt(( sq_sum - 2*mu*sum + count*mu*mu )/(1.0*count)); index = (V(vin)-low_v)/range * bins; if( index >= 0 && index < bins ) bin[index]++; s++; if (s == bins) s=0; V(mean) <- mu; V(rms) <- sigma; V(sweep) <- low_v + s/(1.0*bins)*range; V(plot) <- bin[s];
E.5. Jittered data source
// Spectre AHDL for PeteExp, datasource, ahdl
# define PI 3.1415926535#define getbitnum(t) floor(t*Bps)
module datasource ( d0, d1, sweep, Jout ) (Offset, Vmag, Bps, Sigma) node [V,I] d0; node [V,I] d1; node [V,I] sweep; node [V,I] Jout; parameter real Offset=-1.50e-1; parameter real Vmag=-1.50e-1; parameter real Bps=2.0e10 from (0:inf); parameter real Sigma=1.0e-11; // Local Variables integer bitnumber,newbitnumber,cbnum,cbval; real jitter; real c_0,c_1,c_2,d_1,d_2,d_3,T,X,p; initial bitnumber=0; newbitnumber=0; jitter=0.0; c_0 = 2.515517 ; c_1 = 0.802853 ; c_2 = 0.010328 ; d_1 = 1.432788 ; d_2 = 0.189269 ; d_3 = 0.001308 ; analog newbitnumber=getbitnum($time()); // time*bps, but want the fractions V(sweep) <- ($time()*Bps - newbitnumber); if (newbitnumber!=bitnumber) bitnumber=newbitnumber; // Create jitterval for this new bit jitter=$random(); if (jitter<=0.5) p=jitter; else p=1.0-jitter; T = sqrt( ln(1.0/(p*p)) ); X = T-(c_0 + c_1*T + c_2*(T*T))/(1 + d_1*T + d_2*(T*T) + d_3*(T*T*T)); if (jitter>0.5)
184
jitter=-1.0*X*Sigma; else jitter=X*Sigma; $break_point((1.0+newbitnumber)/Bps+jitter); V(Jout) <- jitter; // Get possibly current different bit number cbnum=floor(($time()+jitter)*Bps); // Convert bit number to bit value cbval=cbnum % 2; V(d0) <- $slew(Offset-Vmag*(2*cbval-1),3.0e10,-3.0e10); V(d1) <- $slew(Offset+Vmag*(2*cbval-1),3.0e10,-3.0e10);
185
Appendix F. Toplevel Chip Schematics
F.1. Serdes I Transmitter
186
F.2. Serdes I Receiver
187
F.3. Serdes II Tranciever