circuits for the design of a serial communication system

Circuits for the Design of a SerialCommunication System

Utili zing SiGe HBT Technology

by

Thomas W. Krawczyk Jr.

A THESIS SUBMITTED TO THE EXAMINING

COMMITTEE OF RENSSELAER POLYTECHNIC INSTITUTE

IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

MAJOR SUBJECT: ELECTRICAL ENGINEERING

John F. McDonald, Chair Gary Saulnier, Prof. ECSE

Kenneth A. Connor, Prof. ECSE Lester Rubenfeld, Prof. Math

Donald Mil lard, Prof. ECSE

Rensselaer Polytechnic Institute

Troy, New York

November 2000

ii

© Copyright 2000

by

Thomas W. Krawczyk Jr.

All Rights Reserved

iii

Table of Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1. Introduction & Historical Review . . . . . . . . . . . . . . . . . 11.1. Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. The three chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. Project time line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4. State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5. Contribution to the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5.1. Feed Forward Interpolated VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.2. Transmitter Interleaving Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.3. Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5.4. Receiver PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6. SiGe 5 HP Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.7. Testing Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.8. Document Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2. Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . 152.1. Serial Communication Block Diagram . . . . . . . . . . . . . . . . . . . . . . 152.2. Transmitter / Multiplexer / Clock Multiplier . . . . . . . . . . . . . . . . . . 162.3. Transport Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4. Receiver / Demultiplexer / Clock & Data Recovery . . . . . . . . . . . . 182.5. Internal Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6. Support Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

iv

3. Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . 213.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2. The need for a VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3. Simple Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.4.1. Adjustable Voltage Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.2. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.3. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.4.4. Optimization of Simple CS VCO (post-fabrication). . . . . . . . . . . . . . . . . . . 27

3.5. Current Starving with Feed Forwarding . . . . . . . . . . . . . . . . . . . . . 293.5.1. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5.2. Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4. Feed Forward Interpolated VCO . . . . . . . . . . . . . . . . . 354.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2. The Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4. Stage Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5. Circuit Implementation and Analysis . . . . . . . . . . . . . . . . . . . . . . . 44

4.5.1. Cascode amplif iers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.5.2. Emitter Resistor for linearity and gain adjustment . . . . . . . . . . . . . . . . . . . . 454.5.3. Center capacitor to control frequency range center . . . . . . . . . . . . . . . . . . . 464.5.4. Bypass resistor to prevent stage decoupling . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.6. System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6.1. Branch current to frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6.2. Center frequency and intrinsic stage delay . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.3. Frequency gain at the center frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.4. Frequency Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.7. Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.7.1. The Impulse Sensitivity Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.7.2. Solving for phase noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.7.3. Phase noise comparison between the FFI and CS VCOs . . . . . . . . . . . . . . . 57

4.8. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9. Interconnect Parasitic Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 604.10. HDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.11. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.11.1. Circuit Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.11.2. Layout Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.12. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

v

4.12.1. Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.12.2. Common Mode Gain (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.12.3. Response versus supply voltage (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . 684.12.4. Phase noise measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.12.5. Jitter measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5. Design of the Transmitter . . . . . . . . . . . . . . . . . . . . . . 725.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2. Top Level Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 725.3. 16-1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3.1. The Case for the Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.3.2. Final Implementation and Simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.4. Phased Locked Loop (Frequency Synthesizer) . . . . . . . . . . . . . . . . 825.4.1. Input Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.4.2. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.4.2.1. Phase detector (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.4.2.2. Phase detector (Serdes II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4.2.3. Phase detector (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4.3. The VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4.4. Loop Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.4.1. Serdes I Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.4.2. Serdes II Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4.4.3. Serdes III Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4.5. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.4.6. Lock Acquisition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.4.6.1. Serdes I Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.4.6.2. Serdes II Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.4.6.3. Serdes III Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.4.7. 20 / 40 Gb/s Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.5. Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.6. Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.7. Line Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.8. Internal Testing Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.8.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.9. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.9.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.9.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.10. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.10.1. Serdes I (transmitter test results). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.10.2. Serdes II (transmitter test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.11. Future Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

vi

5.11.1. 8B/10B Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.11.2. Transmitter data retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.11.3. LC Oscil lator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6. Design of the Receiver . . . . . . . . . . . . . . . . . . . . . . . . 1216.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.2. Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3. Receiver PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3.1. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3.1.1. Transition Detector (PD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3.1.2. NRZ Phase / Frequency Detector (PD/FD) . . . . . . . . . . . . . . . . . . . . . 129

6.3.2. The Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.3.2.1. FET Charge Pump / Proportional Control (Serdes I) . . . . . . . . . . . . . . 1316.3.2.2. Negative Impedance Charge Pump (Serdes II) . . . . . . . . . . . . . . . . . . . 1336.3.2.3. Mixed Loop (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.3.3. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.3.3.1. Serdes I (FET charge pump) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.3.3.2. Serdes II (negative impedance charge pump) . . . . . . . . . . . . . . . . . . . . 1366.3.3.3. Serdes III (dual-loop / referenced loop) . . . . . . . . . . . . . . . . . . . . . . . . 137

6.4. 4-16 Demultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.5. Registers and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.6. Line Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.7. Test Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.7.1. On-chip test pattern generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406.7.2. True error rate detector (TERD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.8. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.8.2. Serdes II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.9. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.9.1. Serdes I (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.9.2. Serdes II (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.1. Sampling offset correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.2. 40 Gb/s?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.10.3. Demultiplexer improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Discussion & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 150

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

vii

A. IBM SiGe 5 HP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.1. NPN Vbe characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.2. NPN Ic versus Vce characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 158A.3. NPN fT Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

B. CML Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.1. CML Voltage Swing (non-linearized, digital) . . . . . . . . . . . . . . . 160B.2. CML Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.3. Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161B.4. Buffer with emitter follower outputs . . . . . . . . . . . . . . . . . . . . . . . 162

C. CML Circuit Details . . . . . . . . . . . . . . . . . . . . . . . . . 164C.1. Linearizing the differential ampli fier . . . . . . . . . . . . . . . . . . . . . . 164C.2. Current bypassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166C.3. CML delay increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

D. Transistor Sizing to Minimize VCO Delay . . . . . . . 172

E. SpectreHDL models . . . . . . . . . . . . . . . . . . . . . . . . . 178E.1. FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178E.2. 3-State PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179E.3. Transition Detector PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180E.4. Histogram generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181E.5. Jittered data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

F. Toplevel Chip Schematics . . . . . . . . . . . . . . . . . . . . . 184F.1. Serdes I Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184F.2. Serdes I Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185F.3. Serdes II Tranciever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

viii

List of Figures

Figure 1-1. Past and proposed future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Figure 2-1. Toplevel System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 3-1. Four stage VCO diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 3-2. Current Starving VCO frequency and gain response . . . . . . . . . . . . . . . . . 23Figure 3-3. Adjustable Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 3-4. Layout of Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3-5. Test data from Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 3-6. Frequency Response versus emitter length in delay elements . . . . . . . . . . 29Figure 3-7. Feed-forward CS VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 3-8. Feed forward CS VCO frequency response and gain . . . . . . . . . . . . . . . . 31Figure 3-9. Feed-forward CS Delay Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 3-10. Testing Data from feed-forward CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 4-1. Schematic for Delay Interpolated VCO element . . . . . . . . . . . . . . . . . . . . 36Figure 4-2. Feed Forward VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 4-3. FFI VCO under boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 4-4. Feed-forward interpolated simulated response . . . . . . . . . . . . . . . . . . . . . 38Figure 4-5. Delay versus weighting factor with single stage imbalance . . . . . . . . . . . 42Figure 4-6. Decoupling versus delay injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 4-7. Schematic for FFI VCO element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4-8. FFI VCO frequency versus emitter resistance . . . . . . . . . . . . . . . . . . . . . . 46Figure 4-9. FFI VCO frequency versus centering capacitor . . . . . . . . . . . . . . . . . . . . . 47Figure 4-10. FFI VCO frequency versus bypass resistance . . . . . . . . . . . . . . . . . . . . . . 48Figure 4-11. FFI VCO Frequency Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Figure 4-12. FFI VCO System from control voltage to frequency . . . . . . . . . . . . . . . . . 49Figure 4-13. Simulated versus analytical response of the FFI Architecture . . . . . . . . . . 50Figure 4-14. Center frequency simulation and model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4-15. Current pulse effect on phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4-16. Simulated ISF for FFI VCO and output waveform . . . . . . . . . . . . . . . . . . 55Figure 4-17. ISF rms values for various ring oscill ators . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure 4-18. FFI with capacitive interconnect parasitics . . . . . . . . . . . . . . . . . . . . . . . . 61Figure 4-19. FFI Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 4-20. Reducing substrate coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 4-21. FFI waveform at 5 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4-22. FFI VCO measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 4-23. FFI common mode response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 4-24. FFI response versus supply voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Figure 4-25. Open loop phase noise of FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Figure 4-26. FFI VCO analytical and measured jit ter . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 5-1. Transmitter and multiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 73

ix

Figure 5-2. Data timing for the 4-1 multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 5-3. CML Two Level Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 5-4. Simulation Testing of CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . 77Figure 5-5. Simulation Results for CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . 78Figure 5-6. CML Single Level Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . 78Figure 5-7. Symmetric multiplexer transistor states . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Figure 5-8. Multiplexer Eye Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5-9. Multiplexer Layout for Serdes I and II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5-10. Linear model of PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Figure 5-11. Frequency synthesizer evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Figure 5-12. Schematic for input filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figure 5-13. Input filter frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Figure 5-14. Phase detector schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 5-15. Simulated phase detector responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure 5-16. PLL frequency detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Figure 5-17. Passive Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 5-18. Tx PLL passive loop fil ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 5-19. Tx PLL active loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 5-20. Active loop filter transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Figure 5-21. Receiver III integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Figure 5-22. Voltage spectral density for optimal loop bandwidth . . . . . . . . . . . . . . . . 96Figure 5-23. PLL simulated step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Figure 5-24. PLL I simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Figure 5-25. PLL II simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Figure 5-26. 5/10 GHz PLL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Figure 5-27. Clocking scheme for transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Figure 5-28. Transmitter clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 5-29. Load counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Figure 5-30. Serdes I LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Figure 5-31. True error rate detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Figure 5-32. Serdes II bit pattern generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Figure 5-33. Serdes I transmitter layout and photograph . . . . . . . . . . . . . . . . . . . . . . . 111Figure 5-34. Serdes II chip layout and microphotograph . . . . . . . . . . . . . . . . . . . . . . . 113Figure 5-35. Transmitter waveform (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Figure 5-36. Serdes 2 transmitter eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Figure 5-37. Tx PLL measured phase noise spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Figure 5-38. Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Figure 6-1. Top level receiver architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Figure 6-2. Receiver PLL evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Figure 6-3. Receiver topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Figure 6-4. Transition detector in prototype I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Figure 6-5. Transition detector in prototype II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Figure 6-6. Gain of transition detector with data jitter . . . . . . . . . . . . . . . . . . . . . . . . 128Figure 6-7. Phase detector for NRZ data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Figure 6-8. Receiver loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Figure 6-9. MOSFET charge pump integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

x

Figure 6-10. Proportional control and summing junction . . . . . . . . . . . . . . . . . . . . . . . 132Figure 6-11. Serdes I loop locking in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Figure 6-12. Frequency and phase lock-in of serdes III Rx PLL . . . . . . . . . . . . . . . . . 138Figure 6-13. 4-16 demultiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Figure 6-14. Serdes I receiver layout artwork and photograph . . . . . . . . . . . . . . . . . . . 143Figure 6-15. Serdes I receiver locked to data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144Figure 6-16. Serdes I recovered clock showing ji tter. . . . . . . . . . . . . . . . . . . . . . . . . . 145Figure 6-17. Serdes II Rx locked to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146Figure 6-18. Serdes II receiver clock phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Figure 6-19. Revised 4-to-16 demultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Figure A-1.Ic-Vbe characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Figure A-2.npn transconductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Figure A-3.Ic-Vce characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Figure A-4.fT vs Ic characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Figure B-1.Current switching versus differential input voltage . . . . . . . . . . . . . . . . . . 160Figure B-2.Simple CML Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Figure B-3.Reference Voltage Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Figure B-4.CML Buffer with emitter followers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Figure C-1.Linearizing differential amplifier with emitter resistors . . . . . . . . . . . . . . . 164Figure C-2.Branch current response for various emitter resistors . . . . . . . . . . . . . . . . . 165Figure C-3.Simulated / Analytical Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165Figure C-4.Limiting full current switching with bypass resistors . . . . . . . . . . . . . . . . . 166Figure C-5.Current limiti ng effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . 167Figure C-6.Current gain effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Figure C-7.Designing for gain with emitter and bypass resistors . . . . . . . . . . . . . . . . . 170Figure C-8.Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170Figure C-9.Delay Model with Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171Figure D-1.Delay from emitter follow to differential ampli fier . . . . . . . . . . . . . . . . . . 173Figure D-2.Delay from differential amp to emitter follower . . . . . . . . . . . . . . . . . . . . . 174Figure D-3.Emitter follower size between driver and receiver . . . . . . . . . . . . . . . . . . . 175Figure D-4.Delay when using optimized emitter follower . . . . . . . . . . . . . . . . . . . . . . 176Figure D-5.Delay difference between circuit with follower and one without . . . . . . . . 177

xi

List of Tables

Table 1-1. Equipment used for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Table 4-1. Circuit parameters for calculating jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Table 5-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Table 5-2. Bondpad pin-out of Serdes II chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Table 6-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

xii

Acknowledgements

First and foremost, I want to thank my family. Although they have little knowlege of

the research I have done, they have helped more than they know. Without them this would

have been a much more difficult undertaking.

I want to thank my advisor, Jack McDonald, for his assistance and guidance during

the past few years, and for providing me with the oppurtunity to work with cutting edge

SiGe technology. The members of my committee, Kenneth Connor, Gary Saulnier, Les

Rubenfeld, and Don Mill ard, also deserve thanks for providing insight and guidance in my

research. I would like to extend a special thank you to Dr. Mil lard for being a wonderful

mentor and friend since I began graduate school. He has always been there for me.

Also, without my fellow Frisc members and friends, Pete Curran, Samuel Steidl,

Matthew Ernest, Steven Carlough, and Bryan Goda, this certainly would have been a

boring voyage. Thanks for help.

I am indebted to Hank Dardy and Basil Decina at NRL (contract #N00173-99-1-

G013) for their support in this work. I also wish to thank Sierra Monolithics Incorporated

and IBM for the fabrication of my chip designs and for providing additional insight in this

research; and Intel, for providing a fellowship to support this work.

When I left high school I said, “ I’ ve just conquered a small hill i n my li fe only to look

out and see a huge range of mountains before me.” Am I perhaps standing on the top of the

first mountain I saw?

xiii

Abstract

The current high-growth nature of digital communications demands higher speed

serial communication circuits. Present day technologies barely manage to keep up with this

demand, and new techniques are required to ensure that serial communication can continue

to expand and grow.

The goal of this work was to research, design, implement, test and evaluate high

speed serial communication circuits. Research involved an in-depth study of the state of the

art in high speed digital and analog circuits; SiGe technology; and serial communication

circuits. Two prototype 20 Gb/s transceiver chips were designed using current mode logic

(CML) bipolar logic families and using IBM’s SiGe 0.5 µm heterojunction bipolar

transistor (HBT) technology. Following fabrication of two designs, the completed chips

were extensively tested, and test results were compared to expected results from

simulation. After optimization and many improvements, a prototype communication

system was designed and prepared for fabrication.

The optimized second prototype operated at speeds in excess of 20 Gb/s. It utili zed a

novel four stage feed-forward interpolated ring voltage controlled oscill ator (VCO)

architecture, for which RPI is pursuing a patent. By feed-forwarding every stage’s output

by one stage the architecture improved the core frequency by greater then 33% with a phase

noise of -90.2 dBc/Hz at 1 MHz. The transmitter took advantage of the phase quadrature

nature of the VCO in a unique multiplexing technique that required the development of a

new 2-to-1 multiplexer. This multiplexer had full input to output symmetry on all three

inputs and was capable of performing output data retiming. The PLL had a wide bandwidth

of 30 MHz, to suppress VCO noise, and produced in-band jit ter of 2.0 ps from 100 kHz to

100 MHz.

The receiver, similar in both prototypes utili zed the full eight phases of the VCO to

twice oversampling every data bit in the phase detector (PD). It was capable of extracting

timing information from every rising and falli ng transition. The loop filter incorporated a

xiv

negative impedance charge pump integrator which exhibited excellent performance. Four

bits of data were sampled through the PD and a 4-to-16 demultiplexer produced the 16 bits

of parallel data.

A third prototype was developed, but not fabricated, using the data acquired from the

first two designs. The transmit PLL bandwidth was optimized to account for the phase

noise measurements of the VCO. As a result, a frequency detector was required and added

to the PLL to increase the pull -in range. The loop filter was also modified to use the

negative impedance charge pump from the receiver PLL. The receiver demultiplexer

scheme was improved to decrease the timing constraints. In addition, the receiver PLL was

optimized to improve the bit error rate.

1

1Introduction & Historical Review

1.1. Motivation and Goals

The research presented in this thesis deals with understanding and designing the

critical components that make up a serializing and deserializing, or Serdes, circuit. The

extremely complicated nature of such a system required a focused study that did not address

many of the issues that are present in a similar commerciall y designed product.

Funding for the project was acquired though Dr. Jack McDonald from the Naval

Research Lab, NRL. The requirements were to design a SiGe short-haul Serdes system

capable of 20 Gb/s that would assist in research that may eventually lead to 40 Gb/s.

Serdes circuits, discussed more thoroughly in the following chapter, consists of three

parts: a transmitter, a receiver, and a channel. The transmitter accepts streams of data in

parallel and multiplexes them together into a single serial stream. Distinguishing the bits at

the receiver input, after they travel through the channel, is a primary concern. The receiver

accepts the serial stream and demultiplexes it back to the parallel data. It must be sensitive

to changes in the data, in order to limit the error rate. The channel connects the transmitter

and receiver, and typically consists of ampli fiers, repeaters, and optical wiring.

IBM’s SiGe HBT process technology was chosen because of the Frisc group’s

strength in high-speed bipolar design, and because of the state-of-the-art nature of the

process in the industry. The process provides integration with current CMOS technology

enabling a very wide variety of circuit topologies. This research used the 5 HP process

technology, with 5 levels of metal. It offered 50 GHz fT (transition frequency) HBT and

0.25 µm CMOS transistors.

One way of grouping Serdes circuits is by the distance over which the serialized data

is expected to travel. Systems, such as Synchronous Optical Network (SONET), are

implemented over distances greater than 100 km, and are considered long-haul. Short-haul

Serdes, on the other hand, is limited to short distances, such as a LAN, or between CPUs in

a multi-processor system. This distinction between short and long haul systems has

2

important implications on the criti cal specifications of the circuit. For long-haul systems,

phase noise is critical, as it dictates the total bit error rate (BER) through the long and noisy

channel. Short-haul is less sensitive to phase noise and is instead focused on bit throughput

and higher bandwidth.

Current industry level Serdes designs, as of the year 2000, run at 10 Gb/s and util ize

the same or similar 5 HP technology. Pushing the goal to 20 Gb/s and even 40 Gb/s was

intended to place this research on the cutting edge and evaluate the maximum potential of

the technology.

In addition to the goals of the NRL contract, various other factors motivated the

development of this project. First was the available test equipment. The lack of faciliti es to

test a packaged part necessitated a chip with wafer probing capabilit ies. This limited the

total testable signals to 12 RF and 12 DC at one time. Without packaging, a fully integrated

solution was necessary, rather than one that needed off chip components, such as capacitors

and op-amps.

1.2. The three chips

The total design process consisted of three separate designs. The fi rst design, Serdes

I, was a prototype that tested some of the key components of a complete design. It was

fabricated in February 1999. This chip was an excellent starting point for the development

of a fully functional chip.

Serdes II was investigated and studied after the results from Serdes I were analyzed.

It possessed improvements in important areas such as the PLL, the multiplexer, the receiver

topology, and the VCO. Unfortunately the tape-out date was earlier than expected and

allowed only one month for final design and layout. This proved to be a difficult time line

and some design issues were left unresolved.

Following the collection of data from Serdes II , a third iteration, Serdes III , was

investigated. The design goal was to solve most of the issues uncovered from Serdes I and

Serdes II . Although no new layout was done for Serdes III, a complete set of new simulated

schematics were created. With the addition of some minor support circuits, a fully

functional and optimized Serdes chip could be implemented.

3

1.3. Project time line

Figure 1-1 Past and proposed future researchThis is a time line of the goals and accomplishments of this Serdesresearch.

A time line indicating completed goals is shown in Fig. 1-1. Research into high speed

communication circuits was initiated in August 1998. A paper that appeared in ISSCC

1998, titled “A 10 Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission” [1],

was the basis for the majority of the research. The paper presented a novel idea for a voltage

controlled oscillator, VCO, and a description of a transmitter and receiver circuit.

VCOs are the most important circuit in the design of communication circuits, and as

such, were the starting point for this research. A simple four phase buffer oscil lator was

Start of researchPaper search

Transmitterdesigned

Leap VCOSimple VCO

Receiver designedFinal checks

Serdes IDesign dubmitted

Candidacypreparation

Chips receivedTest VCOs

Candidacy prep.Additional simulations

Test transmitterTest receiverSubmit to ISSCCCandidacy

Start work on Serdes II

SymMux patentSMI offer to fabricateIntense effort to designSerdes II

Serdes II receivedTest FFI VCOTest transmitterTest receiver

Nov, 19

98

Aug, 199

8

Feb, 1

999

May, 1

999

Aug, 199

9

Nov, 19

99

Nov, 19

99

Feb, 2

000

May

, 200

0

Aug, 200

0

Sept,

2000

Submit Serdes II FFI VCO patent

Both patents pendingComplete thesisSubmit JSSC paperDefend thesis

4

designed and simulated. The method for frequency control for this oscill ator originated

from a modified version of Samuel Steidl’ s VCO implementation [2]. An advanced version

of this VCO, with a 66% speed improvement, was subsequently implemented. The desire

to further increase the frequency led to a study of a phase multiplication techniques [3], [4].

Three separate VCO test chips were laid out to test various aspects of the above techniques.

Each chip contained serveral versions of a unique VCO design: with and without phase

multiplication, and under several different loading conditions.

In November 1998, the transmitter circuit started to take shape. One component of a

serializing circuit is the final multiplexer. To design this, a unique register “shuffling”

method was evaluated. As it provided better performance than other techniques and worked

with a slower rate multi -phase clock, it was chosen for the final design. In order to test the

transmitter, a linear feedback shift register, LFSR, was used to provide pseudo-random

data. An additional requirement of the transmitter was operation at a speed relative to a

fixed low frequency clock. This required the development of a phase locked loop, PLL,

capable of synching a low frequency external reference clock to the high rate internal clock.

Starting in December and during transmitter development, a receiver design was

examined. Many improvements were added to the fundamental architecture found in [1].

Instead of gathering timing data from every fourth transition, it was determined that better

performance could be achieved if every transition were used. Since no detailed mechanism

for feedback control was described, some ideas were gathered from a clock and data

recovery paper [5]. Starting with these ideas, a unique PLL was created for clock recovery.

Because of the difficulty of using external function generators, an internal testing source

was developed to provide different bit patterns to exercise the circuit completely.

All six chips, including an integrated transmitter/receiver chip, were designed and

laid out using Cadence software. Simulation was done using HSpice, Matlab, and a digital

simulator developed by Peter F. Curran. Final designs were shipped to IBM during the first

week of February 1999. After six months in fabrication, a finished wafer was returned to

RPI in the beginning of August of the same year.

Chip testing began with a detailed study of the three VCO chips and the test source

VCO in the receiver. It was became apparent that most of the circuits underperformed,

when compared to simulation results. It appeared that under heavily loaded conditions the

5

circuits slowed down more than expected. The transmitter test chip was tested and found to

work with a 25% reduction in frequency. This testing was followed by a detailed inspection

of the receiver chip, which was found to work nearly at the design speed.

During this time, data was being collected for a conference paper to be submitted to

the International Solid State Circuits Conference, ISSCC. Although the chips performed

slightly slower than anticipated, the paper still showed significant advances in state of the

art research. Unfortunately the paper was not accepted, most likely because there was a

frequency mismatch between the transmitter and receiver.

During the remainder of September, a thorough simulation of the VCO, including

layout parasitics, was performed. The initial results showed a close match to the results

measured from the fabricated wafer. Some discrepancy remains regarding how loading

affects the speed of the devices. A continuation of this work will attempt to match

simulations accurately to measured results to ensure that future designs will respond as

expected.

It was necessary to produce a second Serdes chip, drawing on the success of the of

the first test chip, that would meet the goal of a 20 Gb/s. Additional circuitry was needed

to round out the design: a 4-to-16 demultiplexer, an internal testing scheme, transmitter and

receiver integration onto one chip, packagabilit y, and improved performance.

A comprehensive study was performed to determine exactly why and how the chips

underperformed. The design was modified to ensure that the parts would meet the required

specifications. This included complete redesign of the VCO into the Feed Forward

Interpolated VCO (FFI VCO). The new design was based upon the results of the previous

design and the development of a new multiplexer.

In February 2000, an invention disclosure record entitled “The Symmetric

Multiplexer,” was submitted to RPI [6]. The invention improved the standard CML

multiplexer and reduced phase noise and jitter at the transmitter output.

Serdes 2 was finished and submitted to Sierra Monolithics Incorporated, SMI, for

fabrication1 at the end of March 2000. It contained many improvements on the previous

design and was capable of being C4 packaged and wafer tested. After its completion, an

1. SMI volunteered silicon on an experimental run.

6

additional invention disclosure record that focused on the FFI VCO was submitted [7]. The

VCO is a novel approach to designing ring oscill ators. It improves upon many key

parameters of the standard ring VCO.

The Serdes II chip was received three months after tapeout, in the middle of July

2000. Testing began immediately with a complete characterization of the FFI VCO

including its frequency response, CMRR, phase noise, supply response, and ji tter. A high

quality spectrum analyzer was rented to aid in testing and data acquisition. Testing of the

transmitter was followed by a look at clock jitter and data eye diagrams. The transmitter

was a complete success, and operated at 20 Gb/s with rms jitter of 2.0 ps in the frequency

band of 100 kHz to 100 MHz. The symmetric multiplexer appeared to work exactly as

expected. Testing the receiver confirmed an anticipated problem with low lock-in range.

This was also seen in Serdes I and was not completely addressed in the second prototype.

Following the tape-out of Serdes II, intense work was done on Serdes II I. Several last

minute problems were discovered in Serdes II that were corrected in the next iteration. Data

collected from Serdes II allowed the optimization of important PLL parameters in order to

reduce jitter, and improve the pull -in time. A problem with a small pull -in range in both

receiver PLLs required a complete redesign of the loop and the addition of a reference

signal.

Using the data collected in Serdes II , a journal article was submitted to the Journal of

Solid-State Circuits, JSSC, in October. It was titled “A Transmitter Architecture for High

Speed Short-Haul Serial Communication,” and it detailed the FFI VCO, the symmetric

multiplexer and the transmitter architecture.

At the end of September, the RPI patent office reported that they were going to pursue

U.S. patents for both inventions. This would start with an immediate application for

provisional patents that would protect the work after disclosure.

1.4. State of the Art

In the quick-paced research area of high speed communications, industry is currently

cresting the 10 Gb/s barrier while research is beginning in the 40 Gb/s regime. New

microelectronic technologies such as AlInAs/InGaAs heterojunction bipolar transistors

7

(HBT), and SiGe HBTs [8], [9] are playing leading roles. In particular, SiGe HBT and

CMOS technology is proving itself to be a high-speed (60-90 GHz fT), high-yield, high-

integration, and low-cost solution [10], [11]. It possesses the strengths of sili con because of

similar fabrication techniques, but benefits from higher frequencies with the introduction

of germanium [12].

The current state of the art in high-speed serial communications can be broken down

in three basic design areas: VCOs; clock multiplier units (CMU), or transmitters; and clock

and data recovery (CDR) circuits, or receivers.

As the speed of serial communication circuits increases, so too must the speed of the

core building block of the circuit, the VCO. Multi -phase ring oscillators with top speeds

approximately equal to 1/10th of their technology’s fT are being improved [1], [13], [14].

It is common to see speeds around 5 GHz, with maximum quoted speeds up to

approximately 15 GHz through clock phase multiplication [3], [4]. Their Q of unity and

high noise characteristics are more suitable for short-haul systems or for systems that can

tolerate phase noise. In-depth analysis of the sources of phase noise are allowing tight

optimization of circuits [15]-[19]. CMOS differential ring oscill ators running at speeds up

to 5 GHz exhibit -95 dBc/Hz of phase noise at 1 MHz [18], while bipolar rings are quoted

as having phase noise values of -86 dBc/Hz at 1 MHz [20]. Jitter, generally expressed by

the κ constant, has been documented for a sili con bipolar ring running at 625 MHz with a

0.6 mA tail current at 22 n [17].

Ring oscil lator architecture is straight forward and simple to understand. Through

interesting and creative interstage feedback techniques, the VCO frequency, and phase

noise can be improved. A four stage ring VCO that increases its speed by 33% by leap-

frogging the output of one stage to the input of the stage ahead is documented in [1]. This

improves the speed by reducing the effective delay of every stage. A similar, more general

approach is presented in [13], which utili zes sub-feedback inverters that create fast and

slow loops which can be mixed together. An earlier approach, [23], has a five stage core

that potentiometrically mixes the output from the third and fifth stages. By doing this, the

ring is able to operate variably between a 3 stage and a 5 stage oscillator. Finall y, by using

a negative skewed delay scheme, the core frequency of a CMOS ring oscill ator is improved

by 50% [24]. This is accomplished by compensating for the slower PMOS transistors by

s

8

tying the PMOS input to the output of a stage two gates back. This turns the transistor on

sooner than the NMOS, thus improving its speed at the expense of additional power

requirements.

LC oscill ators, on the other hand, which posses a high Q and extremely low noise and

ji tter, are being rigorously researched as VCOs for long-haul serial communication. Unlike

multi-phase oscill ators that can generate frequencies higher than their core frequencies, LC

oscill ators are typically run at the baud rate of the communication channel. Thus, for a 10

Gb/s serdes implementation, a 10 GHz LC VCO is required. A 5 GHz VCO developed by

IBM [21] was quoted as having a phase noise of -98 dBc/Hz at 100 kHz, with a power of

15 mW. A second 11 GHz VCO with an integrated inductor is documented as having a -78

to -87 dBc/Hz phase noise at a 100 kHz offset from the carrier [22].

The state-of-the-art in transmitter, or CMU, research is measured primarily by the

maximum bit rate compared to the transistor technology, the clock jitter produced at that

rate, and the phase noise of the oscill ator.

A 1.062 Gb/s transmitter implementation, [26], utili zes a half-rate ring oscillator. The

ring oscillator incorporates two mixing elements, between every pair of delay elements to

control the rate of oscil lation. Its quadrature outputs are further broken up into four quarter-

rate signals that drive the 10-to-1 multiplexer. The PLL achieves an rms jitter performance

of 9.8 ps.

A low noise, 12.5 Gb/s CMU is described in [27]. It possesses a differential single

phase LC oscill ator with a phase noise of -101 dBc/Hz at 1 MHz. The PLL has a very low

bandwidth of 300 kHz in order to reduce in-band noise. Its reference is at approximately

195.3 MHz and it utili zes a standard 3-state phase detector (PD). The loop filter consists of

a negative impedance ampli fier and a single pole, single zero RC filter. The output jitter is

quoted as 0.4 ps.

An interesting non-optical transceiver described in [28] utili zes a 4-PAM (pulse

amplitude modulation) serial li nk for 8 Gb/s communications. It essentially transmits and

receives four level logic, which allows twice the symbol rate for the same bandwidth. It

exhibits a transmitter output jitter of 2 ps and a receiver ji tter of 4 ps.

As bit rates are pushed higher relative to the transistor technology speed, certain

problems arise. In the transmitter PLL, a clock frequency divider is needed to drive the PD

9

along with the reference signal, and to drive multiplexer inputs. A feedback MS-latch often

does the trick, but for extremely high VCO speeds a new approach is required. A dynamic

frequency divider capable of speeds up to 79 GHz using transistors with an f T of 80 GHz

is described in [29]. It uses an XOR multiplier, a low pass filter inherent in the multiplier,

and it feeds the output back into the multiplier. The only stable condition is when the output

is at half the frequency of the input.

The state-of-the art in receiver, or CDR, design is measured by the ability to extract

data in the presence of both data and clock jitter, and the ability to tolerate pseudo-random

data.

The design described in [30] uses a full rate ring oscill ator with a 12.5 GHz clock to

extract the 8B/10B encoded data at 10 Gb/s. The VCO exhibits a phase noise of

approximately -80 dBc/Hz at 1 MHz. The PLL has a bang-bang PD and is frequency locked

by a 195.3 MHz reference signal. The data PD has a pull -in range of 0.6% and a hold-in

range of 1.2%. This receiver is quoted as exceeding the SONET-192 specifications by 50%.

A 50 GHz fT SiGe 10 Gb/s CDR for SONET is described in [31]. It utili zes an LC

tank VCO running at 10 GHz with a phase noise of -80 dBc/Hz at 100 kHz. The PD is a

Hogge type, and the charge pump uses an active MOSFET positive-feedback pull -up

ampli fier. The recovered clock rms jitter was measured at less than 1 ps, with a bit error

rate of 10-9. SONET specifications for jitter tolerance, jitter transfer, and jitter generation

were all met.

A very high speed CDR discussed in [32] uses a sil icon bipolar process with an fT of

12 GHz for 8 Gb/s operation. The loop filter and VCO are off-chip but the frequency and

PD are both on-chip. The clock jitter was measured at 1.5 ps rms.

1.5. Contribution to the Field

An important aspect of Ph.D. research is advancement of the state of the art, and

proving that such work builds upon the shoulders of others and is not merely a reinvention

of the wheel. Four key components of this research can be quickly singled out as original

and novel, and RPI is pursuing U.S. patents for two of them.

10

1.5.1. Feed Forward Interpolated VCO

The Feed Forward Interpolated VCO is an improvement over the standard ring

oscill ator [1]. The ring VCO in [23] utili zes a similar feed-forward method to extend the

frequency range but the feed-forwarding remains fixed and is not used as the delay control

mechanism. The design presented in this thesis, however, uses feed-forwarding to increase

the frequency range and also as the primary method to control the stage delay. It is versatile

and allows adjustments to be made to the center frequency, tuning range, and gain through

simple parameter changes. The VCO is 33% faster than a simple four stage ring oscillator

utilizing the same power, when it is configured for maximum operating speed. This

increase in speed can be traded for additional phase noise and jitter suppression, making the

FFI VCO a viable alternative to LC tanks when used in a short-haul communication

channel.

An invention disclosure record for this circuit was submitted in May 2000 to the RPI

patent office. In September 2000, the patent office declared that they were going to pursue

a U.S. patent for this invention.

1.5.2. Transmitter Interleaving Architecture

As the bit rate is pushed higher, with respect to the technology speed, it becomes

increasingly difficult to design VCOs that can keep up. Fractional rate oscill ators can solve

this diff iculty, but require tight timing constraints on the output multiplexer. The

transmitter design discussed in this thesis utili zes a relatively slow, well understood,

quarter frequency multi-phase VCO. The novel transmitter architecture allows in-

quadrature phases of the VCO to control a 4-to-1 multiplexer.

Although this approach is similar to the design given in [1], it possesses a few

differences. First, the 4-to-1 multiplexer is implemented as a single gate whereas the

transmitter interleaving architecture breaks the problem into multiple gates. Second, the

multiplexer requires multiple level clock inputs which requires the clock phases to be

skewed. Third, the multiplexer in the papter requires three levels of logic while this new

architecture requires only two. This is important for power saving applications that require

only two levels.

11

1.5.3. Symmetric Multiplexer

During the development of the transmitter a problem developed that required the

basic 2-to-1 multiplexer to be rethought. The problem was that the 2-to-1 multiplexer had

become a criti cal timing path in the transmitter. In other words, any delay mismatches in

this circuit were propagated to the output. After analyzing the problem, a new multiplexer

was developed that had perfect timing symmetry and possessed none of the problems of the

original multiplexer. This discovery enabled the new architecture to operate smoothly. A

U.S. patent for the symmetric multiplexer, li ke the FFI VCO, is being pursued by the RPI

patent off ice.

1.5.4. Receiver PLL

The critical circuit in the design of the receiver PLL was the phase detector (PD).

Typically, a Hogge-type [31], [52] or a bang-bang type PD [30] is used in high speed serial

receivers. The 20 Gb/s goal of this work required a PD to operate twice as fast using the

same technology speed. A bang-bang or Hogge style PD with this speed capabilit y would

be difficult to design and would require a clock at the same frequency as the data. As a

result, a new PD had to be developed.

The new design, called a transition detector (TD), incorporates eight MS-latches,

each clocked by a different phase of the VCO. This allowed the data to be twice

oversampled and timing and information data to be coll ected.

1.6. SiGe 5 HP Overview

IBM’s 5 HP SiGe BiCMOS process incorporates 0.5 µm HBT transistors and 0.35

µm CMOS transistors. The epitaxially graded Ge base in the HBT allows fT speeds of up

to 60 GHz. Also included in the technology are: high breakdown NPN transistors, gated

lateral PNP transistors, polysili con resistors, Metal-Insulator-Metal (MIM) capacitors,

substrate contacts, precision oxide/nitride decoupling capacitors, schottky barrier diodes,

varactor diodes, PIN diodes, electro-static discharge (ESD) devices, last metal (LM) spiral

inductors, resistors (NS, RN, and RI), and LM bondpads.

12

Between three and five layers of metal are provided at the back end of the line for

interconnect1. The first level of metal is for local interconnect and has a minimum width of

0.8 µm and a fixed thickness of 0.63 µm. The last, or highest level, called LM has a

minimum width of 2.4 µm, and a thickness of 2.07 µm. LM is typically used for bond and

C4 pads, power and ground wiring, inductors, and MIM capacitors. An extension to the 5

HP process allows LM to be substituted with analog metal (AM) which is 4 µm thick and

separated by 3 µm from the next layer of metal. AM is primarily used for inductors which

require low resistance and low capacitance to the substrate. Except for AM, all l ayers of

metal are separated by 1.2 µm of sili con dioxide.

The Cadence design kit from IBM provides full Spectre and HSpice models for the

devices listed above. The kit allows the extraction of interconnect capacitance and

resistance to enable full parasitic simulation.

See “ IBM SiGe 5 HP” on page156. describes important NPN HBT parameters in

more detail . Appendix A.1. describes the turn on characteristics of the transistor,

specifically the collector current versus base-emitter voltage. The relationship between the

collector current and the collector to emitter voltage is discussed in Appendix A.2. fT is a

figure of merit for the transistor family and its relation to the collector current is useful

when biasing the transistor for maximum performance. A plot of the transistor fT versus

collector current can be found in Appendi xA.3.

1. Serdes I was submitted in a DARPA multi-user wafer which only allowed three levels of metal. Serdes IIwas submitted through Sierra Monolithics and had the full five levels of metal.

13

1.7. Testing Equipment

1.8. Document Logistics

This thesis is sectioned into an abstract, six chapters, a conclusion, and appendices.

This introduction is the first chapter; it describes the goals and motivations behind this

project and discusses the state-of-the-art, the novelty of this work, and the test equipment.

The second chapter goes through the basic block diagram of a serial communication system

and the function of each block. Chapters three and four detail the development and results

of the two VCOs researched in this work. Chapter five details the transmitter, including the

Table 1-1 Equipment used for testing

Type Model Specs Usage

time-domain oscill oscope

Tek-tronix

11801C

50 GHz • transmitter eye diagrams

• time-domain jitter measurements

spectrum analyzer

Rhode &

SchwarzFSEM

30

30 Hz -26.5 GHz

• VCO frequency response

• VCO common mode response

• VCO frequency versus power supply

• VCO phase noise

spectrum analyzer

HP8563E

30 Hz -26.5 GHz

• Transmitter PLL phase noise

• Receiver PLL phase noise

signal source

HP4430B

< 1 GHz • Low phase noise jitter measurements

signal source

HP8350B

< 10 GHz

• High frequency receiver measurements

power sup-ply

AgilentE3631A

3 ch. DC • Labview controlled VCO frequency and sup-ply response

10 channel RF probes

GGB > 1 GHz • All high speed RF measurements where made using these probes.

12 channelDC probes

GGB < 1 GHz • These probes were used in Serdes II for sim-ple control lines.

LabView & GPIB

• Labview and GPIB hardware simplified the collecting of most data, including VCO phase noise and responses.

14

PLL, architecture, and test structures. The last chapter discusses the receiver, its operation,

and test results. Appendices include information on the SiGe process used in this work, and

circuit details of this technology. In addition the last appendix has the top level schematics

for the Serdes I and II chips.

Three different Serdes designs were researched in this work. The first two were

fabricated and the third represents research for the future. Each design is designated by the

names Serdes I, Serdes II, or Serdes III.

Certain conventions were followed throughout this document. First, node names in

schematics and within equations are in bold font, such as z20 and a11. Second, equation

variables are italicized, as in fo, and ω2. Third, in plots that contain both simulated and

measured data, the simulated data is usually expressed as a dotted line and the measured

data line is solid. Fourth, for equations solved for the general case the units are usually

expressed as a function of the transistor size. This shows how the constants and variables

change depending on the transistor size. In contrast, absolute units were used for specific

circuits and fabricated circuits.

15

2Serial Communication

The exchange of high speed serial data involves three primary components:

transmitter, receiver, and transport channel. A transmitter (Tx) gathers low rate parallel

data and transforms it into high speed serial data. The signal is then transported through the

channel, potentially air, or wire, to a receiver. The receiver (Rx) must then demodulate the

signal and extract the clock and demultiplex the data. The received information is fed out

of the receiver as parallel data.

2.1. Serial Communication Block Diagram

Figure 2-1 Toplevel System Block DiagramThe transmitter accepts parallel data and seriali zes it to a NRZ signal.The receiver accepts the bit stream, extracts the clock and demultiplexesthe data.

clocktree

enco

din

g

linedriverretimer

mu

ltip

lexe

r

reg

iste

rs

linedriver

sup

po

rtci

rcu

its

sup

po

rtci

rcu

its

inte

rnal

test

ing

inte

rnal

test

ing

reference clock

DA

TA

IN

DA

TA

OU

T

Transmitter Receiver

TxVCO

TxPLL

RxVCO

RxPLL

reference clock

tran

spo

rtch

ann

el

clocktree

reg

iste

rs

dec

od

e

dem

ux

l inereceiver

16

Shown above in Fig. 2-1 is a basic block diagram of a serial communication system.

Although most systems do not look exactly li ke this, there is enough in common between

this system and others to say that these diagrams represent all such systems fairly

accurately.

2.2. Transmitter / Multiplexer / Clock Multiplier

The transmitter’s role is to accept a data word of a specified width, serialize it and

drive the data onto a channel. The width of the word depends on the application and is a

function of the input and output bandwidths. For example, an 8 Gb/s serializer, would

require 16 bits at 500 Mbit/s or 64 bits at 125 Mbit/s. Serializing involves multiplexing the

data into an ordered bit stream which is typically a non-return-to-zero (NRZ) format. The

process of driving a channel may consist of a simple 50 Ω ampli fier, or it may consist of a

more sophisticated circuit that is capable of driving an optical driver.

It is possible, depending on the specifications, that the accepted data may be encoded.

The encoding process may include encryption, compression, bit stuff ing, error checking,

and framing [33]. Depending on the design of the receiver, it may be necessary to introduce

additional transitions into the data to meet critical phase locked loop (PLL) specif ications

in the receiver. 8B/10B encoding is popular and guarantees at least one transition every 5

bits [34]. If channel alignment, which means that bit 0 in the Tx comes out on bit 0 in the

Rx is required then encoding wil l be needed.

After possible encoding, the bits are stored in a register of appropriate size for the

incoming word and the multiplexer width. When the multiplexer is smaller than the width

of a word then the bits may be fed into a shif t-register before being multiplexed [35]. This

register and the subsequent multiplexer must be timed very carefully to ensure that bits are

sampled correctly and that no race or runt pulses exist. Sometimes a first-in first-out (FIFO)

system is added to lessen the timing constraints between the data load clock and the

reference clock.

The PLL clocks the multiplexer and the multiplexer performs the serialization

function. This operation may require multiple gates, such as a 32-4 multiplexer followed

by a 4-1 multiplexer, or simply a 16-1 multiplexer. Timing at this stage becomes more

17

critical as the output rate of the multiplexer is at the serial data rate. Often multiple clock

phases or clock frequencies are needed.

The retiming circuit before the line driver re-establishes the transition locations in

order to remove any jit ter or noise introduced by the registers and multiplexers [42]. This

circuit is clocked directly by the PLL to be as noiseless as possible. When low output jitter

is the limiti ng factor in the design, then a retiming circuit is absolutely required.

The retiming circuit, or multiplexer, is often unable to drive the pad and external load

directly, so a line driver is needed [36], [37]. It matches the internal circuitry impedance to

the output impedance and amplif ies the signal to a desirable voltage swing if necessary.

Perhaps the most important circuit in the transmitter is the PLL, otherwise known as

the frequency synthesizer or clock multiplier unit (CMU). It generates the internal clock

signals which may be multi -phase or multi-frequency. It’s required to have low phase

noise, low jitter, and low frequency drift to generate a similarly low phase noise data

stream. The transmitter PLL, as opposed to the receiver PLL usually has a very low

bandwidth in conjunction with a low phase noise VCO to generate the cleanest clock signal.

The PLL locks the phase of an internal high speed clock to an externally supplied low

speed reference. In this way the reference is able to dictate the exact frequency that data is

transmitted. For instance, a 10 Gb/s system may have a 625 MHz reference clock, and a 10

GHz internal clock. The PLL must then match the two frequencies after dividing the

internal clock by 1/16th.

The PLL consists of three basic components: a phase detector (PD), a loop filter (LF),

and a voltage controlled oscill ator (VCO). The PD generates a signal which is a function of

the phase difference between the divided down internal clock and the external reference. In

low speed applications such as this (625 MHz clock versus 10 GHz data rage) the PD can

generate an accurate, linear measure of phase difference. The LF typically consists of an

active filter with high DC gain which has a specific bandwidth and a high frequency pole.

With most of the other gains and parameters in the PLL fixed, the LF is the only circuit that

is adjustable to meet the specifications. The VCO accepts a voltage input and generates an

output signal which has a frequency that is a function of the input. Ideally this relationship

is linear which leads to closed-form linear solutions for the PLL.

18

One of the most important figures of merits for the transmitter is the output data jitter.

Jitter is created inside the VCO and partially filtered out by the PLL. The retiming circuit

and all circuits thereafter add slight jitter to the signal. The transmitter data eye closes

horizontally as more jitter is introduced into the circuit.

2.3. Transport Channel

The channel carries the data from the transmitter to the receiver, and may be

electrical, optical, wireless, or any combination of the three. For long-haul communication

the channel is a significant and sometimes dominant source of phase noise and jitter. For

short-haul communications, however, we assume that the channel is negligible.

2.4. Receiver / Demultiplexer / Clock & Data Recovery

The receiver must extract a clock from a very high frequency serial signal, plagued

with jit ter and noise and use that clock to sample the data. This process is called clock and

data recovery and is made more difficult because transition locations are not guaranteed.

A line ampli fier with a specific input impedance ampli fies the signal to internal levels

while minimizing the distortion. The ampli fier must have a large bandwidth, typically

about 50% higher than the baud rate. Noise injection from this circuit must be minimized

because the data signal is already saturated with ji tter. When an optical channel is used a

laser diode drives the receiver input and a transimpedance amplifier is required.

The receiver has a PLL that is very different from the PLL in the transmitter. First,

the PD must operate at or near the data rate, which requires a simpler circuit and one that

may only provide a non-linear output. The PD must also be able to handle random data that

has random transition locations, if the data is of the NRZ variety. In addition, the key PLL

parameters must be tuned to a signal with high noise content as compared to the PLL in the

transmitter which has a low noise reference as its input. Additional circuitry will be needed

to sample the data using the recovered clock unless the PD does so naturall y.

As in the case of transmitter, a reference clock may be used to bring the receiver VCO

close to the data frequency before clock extraction occurs. This greatly enhances the

operating range of the receiver PLL. The drawback is that two separate PDs and a circuit

19

that can switch between them is needed. This introduces two loops consisting of common

components which must be able to operate independently.

A common component in dual loop PLLs is a lock detect circuit which determines if

phase lock is lost and if it is, the loop switches back to the external reference loop. This

circuit is useful in a high noise environment where data jitter can cause the PLL to become

unstable. It also allows notification to the software layer to resend the lost data.

Once a clock has been extracted from the serial signal, and the data captured, the data

can then be demultiplexed through a series of samplers at decreasing clock rates. For

instance, in a 10 Gb/s system the first resampled data would pass through a 1-to-2

demultiplexer driven by a 5 GHz clock. The second stage would consist of two 1-to-2

demultiplexers driven by a 2.5 GHz clock and so on. If a multiphase clock is used, then

multiple samples can be taken with separate samplers. This allows the use a clock at a

fraction of the data bit rate.

One of the most important parameters in the design of the receiver PLL is its jitter

transfer function. This determines how sensitive the system is to data jitter. The PLL should

be able track low frequency jitter very well . In this case the jit ter transfer function should

be close to 0 dB. At high frequencies the transfer function should drop off in conjunction

with the bandwidth of the loop. Another important parameter is called ji tter peaking. This

parameter describes high frequency jitter components such as those from spurious

modulation. This is especially important in SONET repeaters that feed the receiver clock

back into a separate transmitter. A sequence of many repeaters are very sensitive to this

form of jitter.

After the data is fully demultiplexed down to the desired parallel data width it can be

decoded based upon the encoding scheme used in the transmitter. In some cases this also

involves channel framing which lines up transmitter input channel n with receiver output

channel n. Once the data is decoded it may, li ke the transmitter, be placed in a FIFO to

reduce the timing constraint on the data received clock.

20

2.5. Internal Testing

Internal testing involves performance verification of the transmitter and receiver

before and after being connected in a complete system. For a chip with both transmitter and

receiver components, this may involve a feedback path across the chip from the output of

the Tx to the input of the Rx. The parallel data from the Tx and Rx can then be compared

to determine the bit error rate (BER).

Additional testing modes may involve additional outputs that show the health of the

system [38]. Outputs may also be duplicated and fed to testing equipment while actual data

is being transmitted.

2.6. Support Circuits

Other circuitry may be needed in the system depending on the application. For

example, if a transmitter and receiver are required to operate at different fixed frequencies,

selectors and special input pins are required. Also, circuits within the chip may not be

needed all the time and in some cases a power managing system can cut-off power. This

option reduces overall power consumption but requires additional power-switching

circuits.

21

3Current Starving VCO

3.1. Project History

The Current Starving VCO (CS VCO) was used exclusively in the first serdes design,

which was fabricated in February 1999, in the transmitter, the receiver, and in various

oscill ator test structures. Its performance was suff icient but the design required some

revision to meet frequency specifications. Deficiencies and unpredictable behavior,

however, resulted in its elimination from all subsequent designs.

The feed forward version of the CS VCO was not intended for use in the transmitter

and receiver design. It was instead designed to push the upper frequency limit i n the ring

oscill ator design. However, it had the potential for use in future transmitter and receiver

designs in order to double the speed to 40 Gb/s.

3.2. The need for a VCO

PLLs, frequency locked loops (FLL), clock extractors, and frequency synthesizers all

require a voltage controlled oscill ator. These circuits create one or many signals with a

frequency that are a function of an external control voltage. In a PLL, or clock extractor, a

DC voltage is generated based upon the difference between the VCO signal and an external

signal. This voltage is then fed back into the VCO to create a stable phase feedback loop.

Frequency synthesizers incorporate frequency dividers to create signals of varying

frequencies based upon the VCO’s fixed frequency.

VCOs for Serdes circuits are usually either an LC (inductor, capacitor) oscillator or

ring oscil lator; each having benefits and drawbacks. All VCOs discussed in this section are

four stage ring oscill ators which produce eight unique phases when used with differential

Transmi tter Receiver

22

logic. The architecture of the receiver and transmitter requires this crucial multiple-phase

characteristic.

3.3. Simple Current Starving VCO

The Simple CS ring oscillator has four stages [39], shown in Fig. 3-1, and is able to

create eight unique phases. The frequency of oscill ation is defined by

where T is the delay through the gate. A factor of two is necessary, because after a signal

passes through four buffers it has only changed sign and requires another trip through all

four to oscillate. The frequency and gain response for this oscillator is shown in Fig. 3-2.

Figure 3-1 Four stage VCO diagramFrequency control is accomplished through variable delay elementsarranged in a ring with an odd number of inversions. The operatingfrequency range is a function of the delay element range and the numberof stages in the ring.

The schematic for the Simple CS stage is a buffer, described in Appendix B.4. on

page 162, with level two emitter followers. The differential circuit current source is

connected to the aVref circuit in order to control its current.

3.4. Basic Operation

Current starving VCOs control their frequency by varying the delay through each

stage of the ring. Each stage has a differential ampli fier with one or many adjustable current

sources at the bottom of the tree. In this way, the stage is able to increase its delay with a

decrease in current. This effect is a primarily a result of less current causing a decrease in

f1

2 4T⋅--------------= (3-1)

A

B

C

D

ΦΦΦΦAΦΦΦΦA

ΦΦΦΦB

ΦΦΦΦC

ΦΦΦΦDΦΦΦΦB

ΦΦΦΦC

ΦΦΦΦDΤΤΤΤ

23

the fT of the transistor, as shown in Appendix A.2. on page 158. Even though the smaller

current has less capacitor charging abilit y, the associated smaller voltage swing produces

no net effect in delay.

Figure 3-2 Current Starving VCO frequency and gain responseThe CS VCO’s usable frequency range is between a control voltage of -1.5V to -1.0V or higher. The lower range is limited by the small voltageswing on the output. These simulation results were obtained with oneminimally sized buffer on each stage’s output. Interconnect parasiticswere not included.

Even though current starving is a simple technique for controlli ng delay, it has

numerous disadvantages. The first obvious problem is that at the limits of operation and

control voltage, undesirable conditions occur. At the minimum extreme, the current can be

decreased to the point that sustained oscillations can no longer occur, because the voltage

swing decreases and the gain drops below one. At the maximum, the transistor fT begins to

drop off the opposite side of the fT curve and the transistors begin to slow. This is

potentially disastrous when used in a phase lock loop because the VCO gain has gone

negative and the loop will become unstable.

4.50

4.75

5.00

5.25

5.50

5.75

6.00

6.25

-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0Control Voltage (V)

Fre

quen

cy (

GH

z)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Gai

n (G

Hz/

V)

gain

frequencyresponse

24

Another problem is the that the delay as a function of current is non-li near in nature.

Fig. 3-2 shows the basic frequency response for the Simple CS VCO excluding

interconnect parasitic effects. The gain varies from 3.0 GHz/V to 0.5 GHz/V along the

curve and is never constant. A non-linear gain makes phase locked loops difficult to design.

The output voltage swing is also a concern because as the current increases, the voltage

swing across the pull -up resistors also increases. This alters the load driving abil ity, and

creates a situation which is difficult to model analytically.

Another problem is that the singled-ended nature of the control voltage does not

posses the common-mode noise immunity that is inherent in differential wiring. When

phase noise is a dominant design factor this architecture can be quite limiting.

The are benefits of this style of ring VCO, including its simplicity and a large tuning

range. The layout footprint is also quite small which minimizes interconnect delays.

3.4.1. Adjustable Voltage Reference

Figure 3-3 Adjustable Voltage ReferenceThe input voltage controls the total current through this circuit. In turnthis current is mirrored to all connected sources.

The active current sources in the CS stages are “mirrored” to a circuit that can vary

its current as a function of a single-ended input voltage, as depicted in Fig. 3-3. The current

through the reference circuit, and its derivative with respect to the control input is defined

Vctrl

aVref

Ir

R1

Re

Vee

25

by the following equations:

The emitter resistor, Re, is matched to the current sources emitter resistors so that the same

voltage exists across both. R1 determines the current gain of the circuit and the value is

selected based upon the input voltage swing, and the required output current swing. An

additional diode is added to decrease the voltage drop across R1 allowing a smaller resistor

size.

A common approach to designing a current mirror is to include base-current

compensation through a transistor located on the output (see Appendix B.3. on page161).

This allows the current reference to drive more loads and lessen the current degradation

when more loads are added. The problem with this approach is that it l imits the frequency

response of the circuit. For this reason it was not included in the design. The current driving

capabilit y of the circuit without base-current compensation should be sufficient to drive a

single VCO with an equivalent of 8 µm of loading.

3.4.2. Final Implementation

The development of the transmitter and receiver played a defining role in the design

of this VCO. To meet a goal of 20 Gb/s with a quarter-rate architecture, a VCO centered at

5 GHz was needed. A control voltage range from -0.8 V to -1.6 V was chosen because of

the solid transfer characteristics, and because those limits correspond to one and two Vbe

drops. At the center of the control range a frequency of 5.75 GHz was achieved,

corresponding to a 15% safety margin.1

Symmetry was the leading motivation behind the layout of the Simple CS VCO

shown in Fig. 3-4. The four stages were laid out in a square with the inputs and outputs

facing the center. In this way the interconnect between stages could be limited to a small

1. This safety margin was build in because parasitic simulations were not done prior to fabrication. It wasfelt that a greater then 10% margin would adequately account for interconnect effects.

Ir

Vee Vc tr l 3Vbe–+

R1 Re+---------------------------------------------=

dIr

Vc tr l------------ 1

R1 Re+------------------=

(3-2)

(3-3)

26

region in the center of the design. Power and ground rails, as well as the two reference rails

(aVref, Vref), were placed in closed concentric LM rings around the top.

Figure 3-4 Layout of Simple CS VCO Shown above is the layout for the Simple CS VCO. All inputs andoutputs face inward to minimize the effects of interconnect parasitics.Symmetry was the most important design requirement.

In addition to CS VCOs in the transmitter and receiver a separate test chip containing

CS VCOs was also made. This allowed a more straight forward measurement of the VCO’s

frequency and gain characteristics. This test chip also included an XOR phase multiplier

[3],[4],[20] tree in order to achieve frequencies double and quadruple the nominal 5 GHz.

The goal of the multipliers was only to see how high the technology could be pushed.

3.4.3. Testing Results

The plot in Fig. 3-5 shows the results from an ideal interconnect simulation, a

simulation with capacitive1 interconnect, and measured results from the fabricated circuits.

The 20% decrease in speed between the ideal simulation and the measured results is

1. The IBM 1999B SiGe design kit does not include interconnect resistances correctly and typically simu-lates with a faster response than with capacitance only. Resistance values are also very small and can beignored for these localized wires. For these reasons, only capacitance was included.

102

µ µµµm

27

immediately obvious. Unfortunately this was larger then the 15% safety margin and

resulted in a frequency range that did not meet the 5 GHz center frequency specification.

Between a control voltage of -1.6 and -1.4 the measured VCO tracked very closely

to expectations, but above -1.4 the VCO response becomes lethargic. This is li kely due to

too much current in the tree which is causing a reduction in fT faster then the model

predicts.

Figure 3-5 Test data from Simple CS VCOSimulation with and without interconnect parasitics, and measuredresults are shown in this plot. Measured results track closely with theparasitic simulation with low control voltages.

3.4.4. Optimization of Simple CS VCO (post-fabrication)

From Fig. 3-5 it is clear that the oscillator under performed and missed the 5 GHz

target. This can be directly attributed to initial simulations that did not include resistive and

capacitive interconnect parasitics. Although the layout footprint of the VCO is very small

and designed to minimize wire lengths, parasitics still presented a significant influence on

speed.

The receiver VCO has a frequency range of 4.25 GHz to almost 4.9 GHz. Because

20 Gb/s is the target data rate, we would like 5 GHz to fall in the middle of the operating

3.5

4.0

4.5

5.0

5.5

6.0

6.5

-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0

Control Voltage (V)

Fre

quen

cy (

GH

z)

Simulated

ParasiticsMeasured

28

range of both transmit and receive VCOs. Given that the initial design was slow how can it

be ensured that the next version will meet specifications? Can the measured and simulated

results be used to maximize the likelihood of a successful design?

Each of the four VCO stages must be loaded by an identical buffer which then drives

subsequent circuitry. By using the smallest transistors, 1 µm, in the buffers, the loading on

the VCO will be minimized and its operation will be maximized. Under such conditions the

easiest method for increasing frequency response is to increase the power of the delay

elements by using larger transistors. This has the immediate effect of reducing the effective

loading on each gate and increasing the frequency at a given control voltage. The devices

in the first design iteration had 2 µm emitter lengths and were slightly slow, so an increase

in emitter length should bring the VCO to within specifications. Fig. 3-6 shows the

relationship between frequency response and transistor size used in the delay stages of the

VCO. Because interconnect parasitic simulations require a complete layout this simulation

uses ideal interconnects. As suspected there is an increase in performance when larger

devices are used.

29

Figure 3-6 Frequency Response versus emitter length in delay elementsBy increasing the emitter lengths and keeping the loading the same, theeffective loading is decreased and the performance improves. Thissimulation does not include interconnect parasitics.

It can be seen that a relatively small increase in transistor size from 2 µm to 2.5 µm

achieves a 12% increase in speed at a control voltage of -1.5 V. The 2 µm and 2.5 µm delay

elements have an effective loading of 0.5 µm/µm and 0.4 µm/µm respectively, representing

a 20% decrease. Assuming that the interconnect parasitic effects stays the same or

decreases, the 2.5 µm delay elements should bring about a 12% increase in the VCO

response. From a range of 4.25 GHz to 4.9 GHz a 12% improvement yields a range of 4.76

GHz to 5.48 GHz, which is well within the specifications.

3.5. Current Starving with Feed Forwarding

Some advantages of the four phase simple VCO circuit include: symmetric phases

minimizing phase differences, generation of rising edges every 25 ps at 5 GHz, and a large

frequency range. The motivation for a new VCO design is to enhance the frequency beyond

the limits of this simple design.

4

5

6

7

8

9

10

-1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0

Control Signal (V)

Fre

quen

cy (

GH

z)

10u

6u

4u

3u

2.5u

2u

30

One method to do this is to use a delay cell that averages the signals from the last two

stages as shown in Fig. 3-7 [1],[13],[23], [24]. Stage C accepts inputs from stage B and

stage A, stage D accepts from C and B, and so on. The idea is that the average of the

previous two signals occurs earlier than just the previous signal.

Figure 3-7 Feed-forward CS VCO block diagramEach stage in the VCO receives signals from the previous stage and thestage preceding that one. Stage A can reali ze an effective decrease indelay by utilizing the signal from stage C. The inversions to induceoscillations are left out for clarity.

Mathematically, the nth element presents its output after the average of the n-1st and

n-2nd element outputs plus the delay of the nth element. Solving for difference between two

consecutive stages yields

which shows that the effective gate delay is reduced to two thirds from the intrinsic stage

delay, Ti. The intrinsic delay is defined as the delay of the stage if its inputs were tied

together and treated as a normal buffer.

A

B

C

D

ΦΦΦΦA

ΦΦΦΦB

ΦΦΦΦC

ΦΦΦΦD

ΦΦΦΦA ΦΦΦΦB ΦΦΦΦC

ΦΦΦΦA+ΦΦΦΦB

2222

stage delay

delay sav ings

tn

tn 1– tn 2–+

2---------------------------- Ti+=

tn tn 1–– 23---Ti=

(3-4)

(3-5)

31

Figure 3-8 Feed forward CS VCO frequency response and gainThe Feed Forward CS VCO was designed to achieve the highestfrequency possible. After optimization is operates at twice the speed ofthe Simple CS VCO.

3.5.1. Final Implementation

An important consideration in the design of the feed-forward delay element is its

higher complexity, having two inputs instead of one, which increases the delay. Also,

because there are twice as many wires between stages in the feed-forward design the layout

will be larger and more limited by interconnect parasitics. With this in mind, the most

simple averaging circuit was created that util ized a minimum number of additional

transistors and resistors. The final schematic is shown in Fig. 3-9.

A description of its operation is as follows: If Q2 and Q4 are on, Q1 and Q3 are off, and

signal b arrives first, then signal b will begin to turn Q3 on and Q4 off . This will start to draw

current through Rc1. If b were to completely switch then both Rc1 and Rc2 would carry the

same current: an undesirable condition in which the output is the average of a one and a

zero, which is undefined. The normal operating condition involves b partially switched

followed by the beginning of a switch in the a signal. When this occurs more current flows

9.0

9.5

10.0

10.5

11.0

11.5

12.0

12.5

-1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0

Control Voltage (V)

Fre

quen

cy (

GH

z)

-1

0

1

2

3

4

5

6

Gai

n (G

Hz/

V)

Frequency

Gain

32

through Rc1 and less current through Rc2. The effective switching input can be said to occur

between the two signals, a and j.

Figure 3-9 Feed-forward CS Delay ElementThis circuit operates by averaging the a and b inputs through commonpull -up resistors. The aVref node is varied in order to control the totalcurrent through the tree. Lower current corresponds to longer delay.

One important characteristic in the two current starving VCO circuits is the choice of

collector resistors which affects the output amplitude and the gate delay. An increase in

resistance causes an increase in amplitude and an increase in delay because the same

amount of current produces a larger voltage swing and a larger RC time delay. The simple

CS VCO was designed around an operating frequency of 5 GHz, so a resistance was chosen

so that there was a 200 mV - 400 mV swing around 5 GHz. The feed-forward CS VCO, on

the other hand, was designed to achieve the highest possible frequency response, so a

resistor small enough to maximize the frequency while leaving a 150 mV - 200 mV swing

was used. Fig. 3-8 shows the frequency response of the feed-forward CS VCO.

3.5.2. Testing results

The feed-forward CS VCO was not used in the first transmitter and receiver design

but was implemented in a test chip. It was configured with one load to achieve the smallest

loading effect and thus the highest frequency. The simulation and measured results are

plotted in Fig. 3-10.

Q1 Q2 Q3 Q4

a10 a11 b10 b11 z20z21

aVref

Vref

Rc1 Rc2

33

Figure 3-10 Testing Data from feed-forward CS VCOThe implementation of the Feed Forward Current Starving VCO onlyhad a single load in order to achieve the highest frequency possible. Themeasured results are only about 4% lower than simulations withinterconnect included.

Simulations with one load and no parasitics shows a peak frequency of 12 GHz. With

parasitics the frequency drops by 6% to 11 GHz which tracks very closely with the

measured results. The steep drop off of the measured results at the high end is li kely due to

a high collector current causing a drop off in the transistor fT that is not accurately

accounted for in the models1.

3.6. Conclusions and Future Work

The Current Starving VCOs presented in this section are compact and easy to

implement but they have some crucial deficiencies. Their performance was about 5% worse

than expected from simulations with interconnect parasitics. Feed forwarding allowed a

1. This is supported by information gathered at a meeting at IBM in 1999 concerning measured results fromthe DARPA 2 run. An IBM device modeler was quoted as saying that the fT curves drop off faster then themodels predict.

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

12.5

-1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0

Control Voltage (V)

Fre

quen

cy (

GH

z)

1 Load Simulated

4 Loads Simulated

1 Load w/ Parasitics

1 Load Measured

34

near doubling of speed at the expense of a slightly more complicated circuit. If

implemented correctly this additional speed could be traded off for a reduction of noise.

With an increase in power supplied to the VCO that was implemented, the desired

specifications should be achieved. However, future research into this VCO topology should

be limited because its response is difficult to model and it utili zes a delay strategy which is

poorly understood.

35

4Feed Forward Interpolated VCO


The Feed Forward Interpolated VCO evolved from the Current Starving Feed

Forward VCO and replaced all instances of that VCO in the second serdes chip in submitted

in March 2000. Additional test structures were added to further exercise this VCO, and an

invention disclosure record was submitted to RPI in May, 2000. An RPI provisional patent

was awarded in September 2000.

4.2. The Evolution

The evolution of the Feed Forward Interpolated, VCO (FFI VCO) began with the

Feed Forward Current Starving VCO (FFCS VCO) discussed in Chapter 3. Each stage of

the FFCS VCO averaged the output from the previous stage and the stage before that to

generate a signal with a smaller effective delay. The averaging was fixed and reduced the

delay by 66%.

A common approach in the design of a standard ring oscillator stage without feed

forwarding is to use delay interpolation as shown in Fig. 4-1. The idea is to split the input

signal into a slow and fast path and create a weighted sum of the two to form the output.

Common pull-up resistors, level 3 control inputs, and emitter resistors for linearity make

this possible. The slow path need only delay the signal longer than the fast path and a simple

capacitor can do the trick. The benefits of this VCO stage include a uniform output voltage

swing, a fairly linear response, no limits of operation, and easy minimum frequency control

through the capacitor.

Transmi tter Receiver

36

Figure 4-1 Schematic for Delay Interpolated VCO elementThis VCO element linearly interpolates, the input signal after travelingthrough a fast and slow path. The slow path is created with the additionof a capacitor.

The vision of the FFI VCO occurred when looking at the Delay Interpolated VCO

and realizing that the fast path could be the implemented as the signal from the stage before

the previous stage and the slow path could be from the previous stage. This insight

immediately eliminated the need for the slow path capacitor, and nearly doubled the speed

of the VCO.

The FFI VCO is a delay interpolated VCO with the normal and delayed signals

created from different stages rather than from within each stage. This forces each stage to

have two inputs rather than one and eliminates the need for the slow path capacitor. The

schematic for the FFI stage can be found in Fig.4-7 on page44.

4.3. Basic Operation

On a block diagram level, the FFI VCO looks identical to the Feed Forward Current

Starving VCO shown in Fig. 4-2. The difference is in the method used to control the delay

though each stage. The FFCS VCO controls delay by varying the current through its buffer

which is directly related to the delay through its gate. The feed forward technique simply

c30 c31

Re Re

Cs

z21z20

i20i21

37

reduces the effective gate delay by about 33%. The FFI VCO, on the other hand, linearly

interpolates the signals received from the previous two stages. The current, which remains

the same through the tree, is gradually shifted between the two inputs, p and l, as shown in

Fig. 4-7. The p (previous) input arrives from one stage back, and the l (leap) input arrives

from the stage prior to that. The two signals are weighted by the control signal and summed

by the common pull -up resistors. The final result is the frequency response shown in Fig.

4-4.

Figure 4-2 Feed Forward VCO block diagramEach stage in the VCO receives signals from the previous stage and thestage preceding that one. Stage A can reali ze an effective decrease indelay by utilizing the signal from stage C. (The inversions, to induceoscillations, are left out for clarity)

Figure 4-3 FFI VCO under boundary conditionsDiagram (a) shows the VCO running in the four stage configurationwith the control voltage set to a minimum value. Diagram (b) shows theVCO in the two stage configuration, at the maximum control voltage.

The minimum operating frequency is defined by the oscil lation of the system when

the leap signal is ignored, and only the previous signal is used. In this case, the system is

A

B

C

D

n

n-1

n-2

A

B

C

D

A

B

C

D

(a) (b)

38

running as a four stage oscill ator and has a frequency of about 3.9 GHz. When the control

voltage is switched in the other direction, the leap signal is used, and the previous stage’s

output is ignored. In this configuration the system is running as two separate two stage ring

oscill ators with a frequency of approximately 7.9 GHz. These two cases are depicted in Fig.

4-3. It is useful to look at the system in terms of an effective delay for all control voltages

between the minimum and maximum values.

Figure 4-4 Feed-forward interpolated simulated responseThe frequency response of the FFI VCO is linear across a large rangefrom 4.75 GHz to 7.00 GHz. System gain is flat across the operatingrange.

The effective delay of a stage is defined to be the delay of a stage in a four stage

oscill ator that has the same frequency as the feed forward oscillator. This parameter can be

found by setting the intrinsic delay of a stage to T, setting s equal to the weighting factor

between 0 and 1, and looking at the output transition times of stages n, n-1, and n-2. The

weighting factor is a constant that indicates how much of the leap signal is being used. Set

to 0 the ring acts as a normal 4 stage oscill ator, and set to 1 the ring acts as a 2 stage

oscill ator.

The edge time of stage n is given by

which is the intrinsic delay through the stage, plus the weighted sum of the previous two

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Control Voltage (V)

Fre

quen

cy (

GH

z)

tn T stn 2– 1 s–( )tn 1–+ += (4-1)

39

stages. Solving for the time difference between two stages yields

which is the effective delay and the frequency of the VCO in terms of the effective and

intrinsic delay of each stage. The factor of eight is needed because it takes two complete

cycles through four stages to equal one period of the VCO.

For s equal to 0, the effective delay is equal to the intrinsic delay of the stage. At the

other extreme, when s equals 1, the effective delay is one half of the intrinsic delay. This

makes sense because the system in this configuration has two stages rather then four. (4-3)

also shows that in the Feed Forward CS VCO, where s is fixed at 0.5 has an effective delay

equal to as was shown previously.

The benefits of the FFI VCO are numerous and represents many improvements over

the previously discussed designs. The use of feed forward techniques allows the VCO to

exceed the maximum frequency achievable by a simple four stage ring oscill ator. This is

extremely important if a solid high speed eight phase VCO is required.

Fig. 4-4 shows a linear frequency range from -0.2 V to 0.2 V. This linear range is very

important when designing phase locked loops, because linearity results in simple closed

form solutions. In addition, this VCO has a response with an obvious center and with limi ts

approaching a asymptotic minimum or maximum. In contrast, the CS VCO will stop

operating below a certain frequency. Although a control voltage would never be driven to

such extreme values as to cause malfunction, this can happen in PLLs during power up.

Often an integrator, or capacitor that is never guaranteed to have a specif ic voltage, will be

attached to the VCO control inputs. If it has a poor initial condition, which is maintained

by a non-oscil lating VCO, then the system will become unstable. It is therefore important

to provide the largest control voltage range possible that will still allow the VCO to

oscill ate.

Current through the FFI stage is linearly switched between the previous and feed

forward stages. This forces the total current running through the stage to remain constant.

Teff tn tn 1––T

1 s+( )----------------= =

fvco1

8Teff

----------- 1 s+( )8T

----------------= =

(4-2)

(4-3)

2 3⁄( )T

40

This is important for keeping a constant voltage swing, which ensures consistent operation

in a system where a variation in voltage swing would cause a change in frequency. The

SNR is also dependent on the output voltage swing, which if varying, can complicate the

analysis. This is the problem encountered with the CS VCO described in Chapter 3.

Differential signaling is used for the control input and throughout the rest of this

design. This is crucial when designing for low noise operations since differential wires have

strong common-mode rejection.

One exciting feature of the FFI VCO, that will be examined in detail i n the next

section is the extraordinary capacity for customization of this circuit. First, by controlling

the linearity through emitter resistors, different frequency gains can be used. (Fig. 4-8)

Second, a capacitor at the top of the tree controls the center frequency point. (Fig. 4-9)

Third, resistors exist to limit the frequency range and prevent stage decoupling. (Fig. 4-10).

One minor drawback to this design is the slightly larger layout footprint. The cascode

ampli fiers introduce four addition transistors and if a large capacitor is necessary then a

large amount of space may be required.

4.4. Stage Decoupling

A serious problem exists in the FFI VCO if the weighting factor is pushed to the

maximum value of 1. In this case, each stage, n, is only using the signal from the n-2nd stage

as depicted in Fig. 4-3(b). The VCO now appears and operates as two completely

independent oscill ators. The phase difference between each consecutive stage is no longer

constant and may fluctuate wildly. This undesirable effect is called stage decoupling and

must be addressed in VCO design.

The model used to analyze this situation uses an ideal FFI VCO in which one stage

has a different delay. This modified delay represents the sum of maximum individual delay

excursions that may exist in the real VCO due to unbalanced loading effects, process

41

variations, and signal noise. The stage transfer functions are shown as

with stage a receiving the additional delay of N. The time at an output change for each stage

is represented by a letter and a subscript where the letter is the stage and the subscript is the

nth output change from that stage. The output edges appear in time order described by

The next step is to look at the time between successive outputs from any one stage,

which is simply the sum of the effective delays of the four stages. (4-9) is the same for all

stages, even though N only occurs in stage a, under the condition that stage decoupling has

not occurred. Solving for the time difference between the output of stage a and the output

of stage b using (4-4) through (4-9), yields

which are the desired solutions.

an T scn 1– 1 s–( )dn 1– N+ + +=

bn T sdn 1– 1 s–( )an+ +=

cn T san 1 s–( )bn+ +=

dn T sbn 1 s–( )cn+ +=

(4-4)

(4-5)

(4-6)

(4-7)

a0 b0 c0 d0 a1 …d1 … dn …, , , , , , , , . (4-8)

an 1+ an–4T N+( )s 1+

---------------------= (4-9)

an dn 1––T

s 1+----------- N

1 s–4

-------------+=

bn an–T

s 1+----------- sN

1 s–4

-------------–=

(4-10)

(4-11)

cn bn–T

s 1+----------- s

2N

1 s–4

-------------+=

dn cn–T

s 1+----------- s

3N

1 s–4

-------------–=

(4-12)

(4-13)

42

Figure 4-5 Delay versus weighting factor with single stage imbalanceWith non-ideal delay stages used in the FFI VCO, stage decoupling(effective delay goes to zero) can occur when the weighting factor is toohigh. This is because the VCO acts as two independent 2 stageoscillators instead of one 4 stage oscillator.

These equations are in the form of the effective delay plus a factor for the unbalanced

delay N. The delay between stages c and b; and between a and d increases rapidly as s

approaches 1, and the delay between stages d and c; and between b and a decreases rapidly

under the same condition. This divergence is expected because the sum of the four delays

follows very closely with the effective delay curve when there is no unbalanced delay. This

effect is plotted in Fig. 4-5. Also shown is the curve for all i nter-stage delays when no extra

delay is introduced. The divergence between the nominal curve and each of the unbalanced

curves can be clearly seen.

Each stage is affected by the additional delay, but when analyzing stage decoupling

it is only necessary to look at bn - an. The delay ba is the most seriously affected of all the

delays because it is relative to the output of the stage with the additional delay included.

The condition when stage decoupling occurs is when ba goes to 0 and the output of stage

b coincides with the output of stage a. Although the equations are continuous at this point,

reasonable operation dictates that stage output times should be sequential.

0

0.5

1

1.5

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Weighting Factor

Nor

mal

ized

Effe

ctiv

e D

elay

ad cb

dcba

N=0

43

Figure 4-6 Decoupling versus delay injectionWhen an unbalanced delay is injected into a single stage, decouplingbetween stages occurs when the weighting factor reaches a specificvalue.

In (4-9) with ba set equal to 0 and solving for s yields the weighting factor for stage

decoupling for a specific value of N. This solution is shown in Fig. 4-6. As the injected

delay increases the point at which stage decoupling occurs departs from the maximum

value of 1.

The effect of stage decoupling is clearly a problem and results in a VCO that operates

improperly. To avoid this problem, the weighting factor must be limited to a value less than

that given in Fig. 4-6, based upon a maximum expected delay injection from noise sources

and parameter variations. For example, if a maximum 10% deviation is expected

(extremely large value), then s must be kept below approximately 0.95. In practice this

VCO has a very large operating range which can be sacrificed to prevent stage decoupling.1

1. For the final implementation of this system s was kept below 0.8 to introduce a huge safety margin inwhich no decoupling will occur.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Normalized Additional Delay (N/T)

Sta

ge D

ecou

plin

g (s

)

44

4.5. Circuit Implementation and Analysis

Figure 4-7 Schematic for FFI VCO elementThis VCO element linearly interpolates, through the control voltage (c),the signals from the previous buffer (p) and the buffer previous to that(l). Rb limits the operating range of the VCO, Re adjusts the controlvoltage range, and Cc defines the center of the operating range.

The circuit shown in Fig. 4-7 represents one element of the FFI VCO. It is a three

input pseudo-buffer, with emitter follower outputs. The control signal, c, is common

between all stages and must be on level 3. The input l (leap) and p (previous) signals are on

level 2 which is matched to the output level. Collector resistors, Rc, are set to generate a

250 mV voltage swing. The current sources were chosen to maximize the fT of all

transistors.

Transistor sizing is a very important parameter when designing such circuits and

further details are shown in Appendix Appendix D. on page 173. Each stage in this VCO

drives two identical stages and the external circuitry, which typically consists of four

minimally sized buffers. For a VCO stage with x µm sized transistors, the external buffer

appears as a 1/x effective load, and is 1/(2x+1) the total load driven per stage. If 1 µm

transistors are used, the buffer becomes 33% of the load. If, however, 10 µm transistors are

used then the buffer becomes a nominal 4.6% of the load. So for larger VCO stages, the

external buffer becomes more invisible, but uses more power and physical space. A

c30 c31

l20l21 p20

p21

Rb

Re

Rb

Re

Cc

z21z20

Rc Rc

Is

z11

z10

45

compromise using 4 µm transistors per gate was chosen which has external loads of 11%

of the total.

Another design challenge, for maximizing frequency response, is to size the

differential ampli fier transistors independently of the emitter follower transistors. Please

see Appendix Appendix D. on page 173 for a detailed analysis. This approach was not

deemed necessary because design specifications of 5 and 10 GHz were easily met without

optimization.

4.5.1. Cascode amplifiers

Above the level 2 differential ampli fiers are cascode, or common base ampli fiers.

They provide a low input load resistance to the common emitter differential amplifier and

act as a impedance transformer. Some delay is introduced by their presence but this is offset

by an increase in driving abil ity and an isolation from the capacitor, Cc. This isolation helps

to ensure a linear relationship between the increase in Cc and the increase in delay. The

cascode ampli fiers also help to reduce phase noise by providing a low impedance output

which limits the effect noise has on the phase.

4.5.2. Emitter Resistor for linearity and gain adjustment

An ideal differential ampli fier has infinite gain, is digital in nature, and requires only

that one input is greater then the other for switching. Real bipolar ampli fiers are not ideal

and possess a high gain approaching 6 (See Appendix C.1. on page 164). High gain is

undesirable when designing PLLs because the VCO will generate more noise and loop

filters will require smaller bandwidths. Without modification, a small change in the control

voltage would cause a large change in current. The solution is to include emitter degeneracy

resistors, Re, which reduce the gain and produce a more linear transfer function. A complete

analysis of a differential ampli fier with emitter resistors is presented in Appendix C.1. on

page 164.

The value of Re was chosen based upon the desired control voltage range of ± 0.2 V,

the linearity across that range, and the frequency range. Fig. 4-8 shows the frequency

response of the VCO as a function of the emitter resistors. Values of Re below

approximately 300 Ω−µm are non-linear at the extremes and produce a gain which is

46

relatively large. Re values above 500 Ω−µm are quite linear but have a limited frequency

range, and produce a small gain. As opposed to high gain, small gain and therefore limited

frequency range, limits the PLLs in their ability to reach target frequency specifications

under all environmental and processing conditions. A trade-off exists between a high and

low resistor value and depends on the needs of the circuit.

Figure 4-8 FFI VCO frequency versus emitter resistanceBy adjusting the emitter resistor, Re, the gain of the VCO can becontrolled. A higher resistance decreases the gain.

4.5.3. Center capacitor to control frequency range center

The capacitor, Cc, between the level 1 outputs is parasitic in nature and used only to

degrade the performance of the circuit. Increasing its size causes an increase in the delay

through the gate, which corresponds to a decrease in frequency. This component is very

useful in centering the frequency range to a given specification; simulation results are

shown in Fig. 4-9. The disadvantage of using this component arises when very low

frequencies are needed, because this requires a large capacitor. Large capacitive elements

require significant amount of space, and because each of four stages needs one, their size

can become prohibitive. Fortunately for frequency centers from 2 GHz through 8 GHz the

component size is quite reasonable.

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4

Control Voltage (V)

Fre

quen

cy (

GH

z)

Note: resistor valuesare normalized to thesize of transistors in µµµµ m.

0 ΩΩΩΩ -µµµµm200 ΩΩΩΩ -µµµµm400 ΩΩΩΩ -µµµµm600 ΩΩΩΩ -µµµµm800 ΩΩΩΩ -µµµµm

47

Figure 4-9 FFI VCO frequency versus centering capacitorA frequency centering capacitor, Cc, is added to increase the delay ofthe stage in order to move the frequency range to within specifications.

4.5.4. Bypass resistor to prevent stage decoupling

The last and perhaps most important element to be discussed are the bypass resistors,

Rb. Their necessity, discussed in Sec. 4.4. on page40, is to prevent stage decoupling from

occurring by limit ing a full switching of current in the tree. In addition to adding decoupling

stabilit y to the VCO, these elements can also be used to limit the frequency range while

keeping the gain nearly constant. See Fig. 4-10 for the frequency response of the VCO

given different values of Rb.

The bypass resistor is tied to the collector of the control input transistors and the top

of the current source. Each node is kept at a nearly constant voltage because the bases from

the level above fix their emitter voltages. Since the voltage across the resistor is constant

the current through it will also be constant. This ensures that some current from the active

current source will always flow through both branches of the tree and thus prevent a

complete depletion of current through the branch. A smaller resistor will allow more

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

Control Voltage (V)

Fre

quen

cy lo

g( G

Hz

)

1.0

1.6

0.6

2.5

4.0

6.3

10

16

Fre

quen

cy (

GH

z)

0 fF / µµµµm25 fF / µµµµm50 fF / µµµµm100 fF / µµµµm150 fF / µµµµm250 fF / µµµµm

Note: capacitor valuesare normalized to the size of t ransi stors in µµµµ m.

48

current to flow and, in the limit , the control transistors will be completely bypassed and

both branches will receive exactly equal current. A complete analysis of this effect is

detailed in Section C.2. on page166.

Figure 4-10 FFI VCO frequency versus bypass resistanceBy adjusting the bypass resistor, Rb, the maximum current through eachbranch can be limited. This resistor prevents stage decoupling andallows frequency range control.

4.6. System Analysis

The frequency profile of the FFI VCO is a function of the various circuit parameters

including nominal stage delay, To, Rb, Re, and Cc. If Rb is removed, Re is set to 0 and Cc is

set to center at 6.0 GHz then Fig. 4-11 shows the frequency response. The range is from 3.9

GHz to 7.9 GHz, which is a one octave range. The period of the VCO is governed by (4-3)

which yields 4T when s = 0 and 8T when s=1, thus the octave range. The addition of the

other circuit components only decreases this range.

A more comprehensive look at the total system response requires an analysis of the

modified differential amplif ier and the relationship between the weighting factor s, and the

current switching between branches. Fig. 4-12 shows a diagram of the VCO frequency

profile as a function of control voltage. The three primary curve parameters are: the

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4

Control Voltage (V)

Fre

quen

cy (

GH

z)

1.6 kΩΩΩΩ -µµµµm2.4 kΩΩΩΩ -µµµµm4.0 kΩΩΩΩ -µµµµm8.0 kΩΩΩΩ -µµµµm

Note: resistor valuesare normalized to thesize of t ransistors in µµµµ m.

49

frequency range, the center frequency, and the gain at the center frequency. Mathematical

models describing each of these parameters can be found in the following sections.

Figure 4-11 FFI VCO Frequency RangeThis is the response when Rb is removed, Re set to 0, and Cc is set togive a 6 GHz center frequency.

Figure 4-12 FFI VCO System from control voltage to frequencyAn analysis of the FFI system should incorporate a study of the circuitresponse and the dynamics of the top-level architecture.

4.6.1. Branch current to frequency

Relating circuit parameters such as Rb and Re to the frequency profile involves a

circuit level description of the differential ampli fier. Circuit level analysis are often

expressed as differential branch current output and as such do not relate to frequency.

Relating branch current to frequency is necessary to achieve the final transfer function.

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Control Voltage (V)

Fre

quen

cy (

GH

z)

freq

uenc

y

ran

ge

gain

control voltage

center frequency

50

From (4-3) we can find the frequency relative to the weighting factor s, which is

directly related to the current by

where T is the intrinsic stage delay, iL is the current through the branch that accepts input

from the “leaped” branch, iP is the current from the “previous” branch, and id is the

differential current. This relationship is confirmed in Fig. 4-13 where the simulated

frequency versus current are shown along with the results from (4-3) and (4-14).

Results at a weighting value of 0.5 show the largest slope difference between the

analytical model and simulation. This slope difference is important when analyzing the

frequency gain and a factor, α, is introduced to compensate. Taken directly from Fig. 4-13,

α has a value of 1.3.

Figure 4-13 Simulated versus analytical response of the FFI ArchitectureThe gray, dashed lines represent simulated frequency response forvarying branch currents, and the black continuous lines represent theanalytical expectation.

s 12--- 1

iL iP–

Io

---------------+ =

f 116T--------- 3

id

Io

----+ 1

8T------ s 1+( )= =

(4-14)

(4-15)

3.0

4.0

5.0

6.0

7.0

8.0

0.00 0.25 0.50 0.75 1.00

Weighting Factor (s)

Fre

quen

cy (

GH

z)

10

15

20

25

30

35

Effe

ctiv

e D

elay

(ps

)

Simulated

Analytical

51

4.6.2. Center frequency and intrinsic stage delay

The center frequency is directly related to the intrinsic stage delay by (4-3) when s is

set to 0.5. The intrinsic delay can be accurately modeled by the results presented in

Appendix C.3. on page171. The center frequency is modeled as

and is validated in Fig. 4-14. Intrinsic stage delay is also plotted in Fig. 4-14 because these

values are needed for the frequency gain and frequency range models. The nominal delay,

To, found through simulation, is 21 ps.

4.6.3. Frequency gain at the center frequency

Figure 4-14 Center frequency simulation and modelThe modeled and simulated intrinsic stage delay and VCO centerfrequency are shown here. The modeled results follow the simulatedresults closely.

The analytical model for current gain as a function of Rb and Re is solved in Appendix

C.1. on page 164, and Appendix C.2. on page 166. To find input voltage to output

frequency gain, two elements are needed: the voltage to current gain and the current to

frequency gain. The former was solved in (C-12) on page 169, and the latter determined by

fc3

16 To 2( ) 2RcCc( )ln+( )----------------------------------------------------------= (4-16)

0

2

4

6

8

10

12

0 50 100 150 200 250

Normalized Capacitance (fF/um)

Fre

quen

cy (

GH

z)

SimulatedModeled

0

20

40

60

80

100

120

140

160

180

0 50 100 150 200 250

Normalized Capacitance (fF/um)

Intr

insi

c S

tage

Del

ay (

ps)

SimulatedModeled

52

differentiating (4-15) and substituting the intrinsic delay equation (C-14) on page171. The

result is

which includes all circuit parameters: Rb, Re, Rc, Cc, Io, and the nominal stage delay To. α

is also included to compensate for the weighting factor and frequency gain difference

between the simulated and analytical results.

4.6.4. Frequency Range

The frequency range of the FFI VCO is mainly governed by the bypass resistor and

partially governed by the emitter resistor. Appendix C.2. on page166 describes how these

parameters limit the differential current through each branch in the VCO stage. This current

is related to the maximum frequency, fmax, through (4-15), where id is replaced with id,max,

which is found in (C-5) on page167. Taking this value, subtracting the center frequency fc,

and multiplying by two yields the frequency range, frange. Using the intrinsic delay

relationship from (C-14) on page171 and (4-15), yields

vd should be set to the maximum differential voltage that is allowed during normal

operation of the VCO.

4.7. Phase Noise

The phase noise of an oscill ator is an extremely important consideration during the

design phase. VCO phase noise and phase jitter directly affect system performance. In

serial communication circuits, a bit stream is generated with the time between transitions

defined by the jitter in the VCO and the PLL. The transport mechanism, which includes the

wire and buffering circuits, also introduce noise, which appears as phase ji tter. The larger

dfdvd

--------did

dvd

-------- dfdid

-------⋅ 12γvTRb

Rb Io 2vbe–---------------------------- Re Rb

||+

-----------------------------------------------------

α16 To 0.7 2RcCc( )+( )Io

--------------------------------------------------------- = = (4-17)

frange 2 fmax fc–( )id max,8IoT--------------

Io Re Rb+( ) vbe

vd

2-----–

–

8Io Rb 2Re+( ) To 0.7 2RcCc( )+( )-----------------------------------------------------------------------------------.= = = (4-18)

53

the jitter at the receiver, the more difficulty the PLL will have tracking the data and

consequently, data corruption will increase. It is therefore imperative to minimize jit ter at

the source to ensure maximum data throughput [15].

4.7.1. The Impulse Sensitivity Function

Noise in circuits is typically related to thermal, device: (shot and flicker), or external

effects. The relationship of the effects to phase noise can be quite complicated and difficult

to solve analytically. A straightforward method that involves an analytical foundation and

some simulation utili zes the impulse sensitivity function (ISF) [18]. It yields a closed form

solution relating circuit noise to phase noise.

Circuit noise appears as either amplitude or phase variations in the output of

oscill ators. When dealing with “digital” ring oscillators, the amplitude variations are small

because of the limiti ng nature of the circuits. Phase variations, on the other hand, are

governed by

where ∆∆∆∆q is a charge step applied to a specific node, qswing is the nominal charge swing on

that node (qswing = Cnode Vswing), and Γ(ωo,t) is the ISF.

Γ(ωo,t) can be considered as the normalized phase response of the VCO given a

current pulse at a specif ic point in the output. The ISF is large when a current pulse causes

a large change in phase and small when the ISF causes a small phase change. Fig. 4-15

shows an example of the effect on phase for two current pulses of the same size but in

different positions. The case on the left applies the pulse during the rising edge, and

effectively increases the rise time and decreases the phase. The pulse applied to the flat

portion of the curve shows lit tle or no phase change, because the circuit restores the initial

value before the edge arrives.

∆φ Γ ωo t,( ) ∆qqswing

--------------= (4-19)

54

Figure 4-15 Current pulse effect on phaseA current pulse, or charge step applied to a node in the VCO wil l have aphase effect depending on the temporal location of the pulse.

Fig. 4-16 shows the simulated ISF for the FFI VCO and the values of the output at

the time that the current pulse is applied. The response appears as it should, with an increase

during the rising edge, a decrease during the falli ng edge and a zero when the output is

constant. This form is very similar to the derivative of the waveform function. The

important values garnered from these results are the dc and rms values of the ISF. The rms

value of 0.077 is used to determine the phase noise and the non-zero dc value of 0.001

shows the upconversion of low frequency noise to base band noise.

The rms value of the ISF is only meaningful when compared against other similar

ring oscill ators. Fig. 4-17 shows various oscill ators and their associated rms values. The

single ended and differential points are CMOS rings tuned to maintain a constant frequency

that is independent of the number of stages. Their values drop with increasing N because

each stage’s transitions represent a smaller fraction of the total period and thus have smaller

effects on the ISF. The CS (Current Starving) oscil lator shows a reasonable match with the

other differential oscill ators. The FFI oscill ator, on the other hand, shows a much lower ISF

when compared to systems with the same number of stages. This has important

ramifications in the total phase noise and is discussed further in Section4.7.3.

current pulse has large phase effec t

current pulse has small phase effec t

55

Figure 4-16 Simulated ISF for FFI VCO and output waveformThe FFI VCO ISF is shown here along with the waveform at the pointthat the pulse is applied.

Figure 4-17 ISF rms values for various ring oscillatorsShown in this plot are the rms values for the FFI, CS (Current Starving),CMOS differential (DE), and CMOS single ended (SE) ring oscillators.

-0.50

-0.40

-0.30

-0.20

-0.10

0.00

0.10

0.20

0.30

0.40

0 1 2 3 4 5 6

Normalized Time (rad/T)

ISF

-1.3

-1.25

-1.2

-1.15

-1.1

-1.05

-1

-0.95

-0.9

-0.85

Wav

efor

m V

olta

ge (

V)

ISF

Waveform

ISF

Waveform

Number of Stages (N)

rms

valu

e of

ISF

3 4 10

0.1

0.2

1.0

FFI

CS

SE

DE

56

4.7.2. Solving for phase noise

Using the superposition integral, the phase response for any injected noise current i(t)

is equal to

The single-sideband phase-noise spectrum due to a white-noise current source is given by

[18]

where Γrms is the rms value of the ISF, is the single-sideband power spectral density

of the noise current source, and ωoff is the offset from the carrier.

Noise in the FFI circuit element shown in Fig. 4-7 is generated primarily by HBT shot

noise and resistor thermal noise. The nodes of interest, those generating the most noise and

the most sensitive to current fluctuations, are the level one outputs, z10, and z11. The level

2 outputs do introduce twice the shot noise but are less susceptible to current induced phase

variations because of their low output resistance and strong restoring force.

The single-sideband power spectral density (PSD) for the resistor noise and the

collector shot noise is

where G is the conductance of the pull -up resistors, and Ic is the current though the collector

which is half the tail current. Further refinement of (4-21) and (4-22), and substitution of

values for temperature, resistance, and current for optimal operation, yields

where N is the number of stages, l is the length of transistors in µm, and ∆φrms is the rms

phase deviation with a simulated charge injection of ∆q.

φ t( )Γ ωo τ,( )

qswing

---------------------i τ( ) τd

∞–

t

∫= . (4-20)

L ωoff Γrms

2

4ωoff2

-------------in2 ∆f⁄

qswing2

---------------⋅= (4-21)

in2 ∆f⁄

in2

∆f----- 4kTG 2qeIc+= (4-22)

L ωoff N( )l2ωoff

2-------------

∆φrms2

∆q2----------------- 161 10 24–× A2

Hz-------⋅ ⋅= (4-23)

57

Using (4-23) at a frequency offset of 1 MHz, the FFI VCO has a phase noise value of

-93.0 dBc/Hz and the CS VCO has a phase noise value of -79.1 dBc/Hz. If cascode

ampli fiers are added to the CS VCO to achieve a more accurate comparison, the phase noise

decreases to -85.1 dBc/Hz. Both VCOs have the about same center frequency1 and both

consume the same amount of power.

4.7.3. Phase noise comparison between the FFI and CS VCOs

The benefit achieved by using the FFI architecture for VCO design, rather than a

standard ring VCO, is at least 8 dBc/Hz of noise reduction. This improvement is quite

compelli ng because it comes without the need for additional power.

There are two main factors which contribute to the noise reduction. The FFI VCO has

a higher frequency because of the incorporation of a novel architecture. This higher

frequency can be traded off for an increase in level one capacitance. Capacitance was added

to each stage to weaken its speed and bring it in line with the speed of a standard ring

oscill ator. Additional capacitance helps to absorb current noise by decreasing the

bandwidth on the outputs. It essentially softens the voltage spike caused by an insertion of

charge at the output node. The CS VCO, for example, has a level one capacitance of 28 fF

and the FFI VCO has a capacitance of about 180 fF.

The second effect is a result of the averaging that occurs between the two inputs to

each gate. Any noise disturbance on one input is offset by averaging and results in a change

of 66% from the unaveraged expected result. At fi rst it would appear that the effect should

only be a 50% but because of the propagation of the effect through multiple averages, the

progression leads to a 66% change. This factor of two thirds corresponds to a 2.2 dBc/Hz

decrease in the overall phase noise.

1. The center frequency of the CS VCO is actually about 70% that of the FFI VCO. If properly matched thenoise value gap between the two wil l only widen because of the larger capacitor required by the FFI.

58

4.8. Jitter

Jitter in a ring VCO is generated by four primary noise sources within each variable

delay element: thermal noise from the collector resistors, tail current noise, sampling of

input noise by switching of differential pairs, and noise at the VCO input [17], [18]. κ is

used as a time domain figure of merit relating the standard deviation of a transition over a

fixed amount of time

Each noise source contributes to the total κ as described in detail in [19]. This equation is

valid for all time in the open loop case and valid for time less then the loop time constant

in the closed loop PLL, case.

In this VCO, the noise generating sources in the delay element are frequency

independent due to the nature of the frequency control. Thermal noise from the collector

resistors remains constant because the capacitance and resistance remain constant. Noise

introduced by the degenerate tail current source also remains fixed. The input differential

pair noise is dependent on the amount of current through the pair, which is linearly

switched between the inputs. Since the total current remains constant, the total noise

contribution from each pair wil l remain approximately constant. For these reasons, the

ji tter introduced by one stage remains constant over all frequencies.

Although noise induced jit ter per stage remains the same, the total jitter per transition

depends strongly on the transition interpolating abilit y of the VCO. When the VCO is

operating in the four stage mode, the jitter in one period is a result of the jitter from all

four stages. However, as the weighting factor is shifted to favor the feed-forward signal,

the jit ter introduced during a full period is only from two stage elements rather than four.

κσt

∆T-----------.= (4-24)

59

The result after including (4-2) is that κ varies according to

The factor of ω/ωo is added to normalize in terms of transitions independent of the

frequency.

Using (4-3) and solving for s as a function of the frequency fraction gives

and substituting (4-26) in (4-25) yields

where κο is the nominal jit ter constant for an identical ring oscillator without feed-forward

interpolation, ωo is the center frequency and ∆T is the time over which the open loop jitter

is being measured. This equation is graphed in Fig. 4-26.

Using the derivation in [19] and the data in Table 4-1 yields a κο of 18 . Through

calculation and simulation it was found that the largest contributor to overall j itter was

from the input differential pairs and the emitter followers.

κσt

1 s+( ) ∆T ωωo

------

---------------------------------------.= (4-25)

s3ω2ωo

--------- 1–≈ (4-26)

κ 23---

ωo

ω------κo≈

(4-27)

n s

60

4.9. Interconnect Parasitic Simulations

Interconnect parasitics are increasing in importance in the design of high speed

circuits. In slower, larger circuit the capacitance and resistance of the interconnect was

dwarfed by device parameters. Now, with very small devices, this is no longer true and

interconnect parameters are as large, or larger than device parameters. Also, with an

increase in operating frequencies, speed of light propagation time becomes a larger fraction

of the overall cycle time.

In general, the effect of non-ideal interconnect is an increase in delay through the

wires. This is crucial for ring oscill ators, since the operation of the circuit requires stringent

control over the delay. If properly simulated and accounted for, an underperforming VCO

can be avoided. An oscillator that achieves significantly higher “ ideal” speeds then

specified is required. It is not uncommon for interconnect to decay speeds by as much as

10% to 20%.

To ensure operation at 5 GHz, the FFI VCO was designed with a 20% safety margin.

To do this, the circuit was designed to run at 6 GHz without interconnect effects included.

This safety margin, in addition to the already large frequency range, assures proper

Table 4-1 Circuit parameters for calculating ji tter.

Parameter Value

Re 100 Ω

Rc 100 Ω

Iee 3.2 mA

Ko 5.5 GHz/V

en(vco) 4.6 nV /

Rbase4 inputs

4 followers

152 Ω x 8

Hz

61

operation at 5 GHz. Only with a 20% interconnect effect and a 20% decay from other

negative effects will the VCO fail to meet the specifications.

Fig. 4-18 shows the effect on the frequency response before and after adding

interconnect capacitance.1 The performance drops a uniform a 12%. Larger effects were

seen in the Current Starving VCO because of smaller transistor size and the resulting larger

percentage of interconnect to total capacitance.

Figure 4-18 FFI with capacitive interconnect parasiticsThe introduction of interconnect parasitics reduces the performance ofhigh speed circuits. When designing a ring oscillator it is absolutelynecessary to include these effects.

4.10.HDL Model

A transistor level model of this ring oscill ator includes 60 active devices and 12

devices for the required balancing loads. If a frequency divider is needed, such as the 1/8

in the transmitter frequency synthesizer, 54 additional devices are needed to represent the

1. The IBM 1999B SiGe design kit does not account for interconnect resistances correctly and typicallyshows a faster response than with capacitance only. Resistance values are also very small for these localizedwires. For these reasons, only capacitance was included.

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4

Control Voltage (V)

Fre

quen

cy (

GH

z)

No Parasitics

Capacitive Parasitics

62

entire VCO. The processing and time limitations imposed on simulating 126 devices is

prohibitive and limits design iteration.

The solution to this problem was to create an analog Hardware Description

Language, HDL, model of the VCO [40]. Spectre HDL, a Cadence package, was used

because it is tightly integrated into Cadence and is very similar to VerilogA which is the

leading analog HDL. The code for the VCO, shown in Appendix Appendix E.1. on

page 179, was modeled after the simulation data in Fig. 4-18. Input loading effects were

included in the model so that no addition circuit needed to be added. The output was also

inaccurately modeled as a sine wave and was buffered by a small buffer to transform the

signal into something more representative of a real signal.

The time associated with simulating the transmitter PLL was reduced by about 60%

with very littl e effect on accuracy. The extra time allowed more frequent design iterations,

since each highly accurate simulation usually takes hours to run. Another benefit of the

HDL model is the ability to extract parameter values such as instantaneous frequency and

phase, which was extremely helpful in analyzing the PLL. With a transistor level

simulation these values are hidden.

4.11.Final Implementation

The final implementation of the FFI VCO was used in revision two (Serdes II) of the

transmitter and receiver and in the FFI VCO test chip. The specifications were based on the

goal of a 20 Gb/s communication system, and the architecture for that system.

4.11.1. Circuit Parameters

Both the transmitter (4-1 multiplexer core) and receiver (twice oversampling core)

required a quarter-frequency clock thereby forcing a VCO with a 5 GHz center frequency.

To remain conservative and ensure that the 5 GHz specification will be reached 4 µm

transistors were used, and a centering capacitor, Cc, of 19 fF/µm or 76 fF was chosen. (see

Fig. 4-14 on page 51) Under ideal simulation conditions this put the center frequency at 6.2

GHz, and when parasitics are included, at 5.4 GHz.

The specification for frequency range was partially dictated by the uncertainty of

achieving the 5 GHz center. Process variations, interconnect parasitics, model inaccuracies,

63

and other simulation difficulties necessitated a large range to ensure that any center

frequency deviation could still achieve 5 GHz. In addition, because the bypass resistors

intended to control stage decoupling also affect the frequency range, their effect must be

considered. The decision was made to maximize the frequency range (see Fig. 4-10 on

page 48) while having a conservative response to the stage decoupling problem. The value

of Rb was chosen to be 6.4 kΩ-µm, yielding a VCO possessing a large range, and a strong

decoupling prevention.

The gain of the VCO was chosen based upon the input control voltage swing and the

need to provide a linear response across all control values. Since a reasonable voltage swing

for CML circuits is 250 mV, as noted in Appendix C, a range corresponding to this swing

was chosen for the VCO. This yielded a value of Re equal to 400 Ω−µm.

In addition to the 5 GHz VCO a high speed 10 GHz VCO was also designed for test

within the Serdes II chip. It had no centering capacitor so that a maximum frequency could

be achieved. The ultimate goal was to see if this faster 10 GHz VCO could be used to design

a 40 Gb/s communication system.

4.11.2. Layout Considerations

A poor layout can result in an underperforming circuit, consequently, layout

preparation is an extremely important design concern. Proper layout of a ring oscill ator

minimizes noise, and interconnect parasitic effects. In addition, because these oscill ators

generate considerable “digital” noise it is crucial to isolate them from nearby analog

circuits.

The first goal in the FFI layout, see Fig. 4-19, was to minimize the number of inter-

stage wires and make them symmetrical to guarantee uniform phase spacing. The solution

was to design a single compact stage and position the four of the stages around a center with

input and outputs in the middle. This provided perfect symmetry and minimal interconnect

but required four unique orientations of the devices. Differing orientation introduces

directional process variations into the design, but symmetry appeared to be the more

important factor.

Substrate coupling1 and power supply noise, although partially offset by the

differential nature of the circuit topology, is important to address. Substrate noise can occur

64

from external as well as internal circuits. Minimizing external substrate noise, and internal

switching effects on external circuits involved the design of a deep trench moat with a

substrate contact ring along the inside, as shown in Fig. 4-20. This act provided a ground

return path for the enclosed circuitry to the substrate contacts and minimized coupling

outside the ring due to the large path around the deep trench. This is critical for this VCO

because of its high frequency, multi-phase digital signals that are often near low-noise

analog loop filters in PLLs. The compact design also forces substrate noise to appear as one

common mode source, thus minimizing its influence.

Figure 4-19 FFI LayoutShown here is the final layout of the FFI VCO. Outputs can be takenfrom the center or the edges of the block.

1. Substrate noise in this SiGe technology is of particular importance because of the substrate’s lightlydoped nature.

deeptrenchmoat

substratering (grounded)

powergroundrails

centeringcapacitor

225

µ µµµm

171

µ µµµm

65

Figure 4-20 Reducing substrate couplingBy using a deep trench moat and substrate contacts, substrate couplingcan be minimized.

Minimizing the length of the supply-lines to pads provides a low resistance ground

return path. Like substrate noise suppression, a compact design forces supplies to appear as

one common mode source. When laying out routes to external circuits where phase

uniformity was important the signals were taken from the center of the VCO to ensure

constant length wires. In addition, dummy buffers were included when a VCO phase output

was not needed to maintain consistent loading.

4.12.Experimental Results

A test chip implementing a 5 GHz (Cc = 76 pF) and a 10 GHz (Cc = 0 pF) FFI VCO

was designed along side the Serdes 2 chip. It placed the two VCOs in an environment that

is identical to that found in the transmitter and receiver. Two input pads with capacitor

bypass provided a differential input for each VCO. The remaining four high-frequency

pads were dedicated to a buffered and a 1/8 divided output of each VCO.

The slower VCO was used in the Serdes 2 transmitter and receiver and had a center

frequency target of 5 GHz. The higher speed VCO was designed to be used in the Serdes 3

project with a center frequency at 10 GHz.

sil icon surface

deep trench DT

inte

rnal

ci

rcu

itry

exte

rnal

ci

rcu

itry short ground

return path

substrate contact

66

Figure 4-21 FFI waveform at 5 GHzThis waveform was captured with a control voltage set to generate a 5GHz output. The peak-to-peak swing is approximately 300 mV.

4.12.1. Frequency Response

The shape of the measured frequency response in Fig. 4-22 is nearly identical to the

simulated response. It is smooth, linear around zero, and monotonically increasing. The

differences are found in the frequency range and center. The center frequency at 0 mV

control voltage, was expected to be 5.33 GHz but was measured 8% lower at 4.72 GHz.

The frequency range dropped 17% from 2.72 GHz to 2.27 GHz. In addition, the gain at

center decreased from 5.57 GHz/V to 4.98 GHz/V.

The measured offset between simulation and test results is li kely due a capacitance

on the level 1 nodes of the ring stages that was larger than anticipated. Base capacitance

modeling has always been a difficult issue, as capacitance can have a considerable effect

on the frequency. A capacitance increase of 50 fF yields a frequency change that would

match the frequency decrease.

Another possibilit y is the poor modeling of fT which has a very dramatic effect on

frequency. Part of the effect can be seen in Fig. 4-24, where the supply voltage, was

increased beyond the nominal voltage. This increased the current, and to a point increased

67

the frequency. Although the CML trees were optimally designed for maximum f T, clearly

more collector current results in a better response.

Figure 4-22 FFI VCO measured resultsThis plot shows results simulated with interconnect parasitics, andmeasured results for the FFI VCO. The target of 5 GHz for the slowerVCO was achieved at a control voltage of 60 mV rather than theexpected -50 mV.

4.12.2. Common Mode Gain (5 GHz VCO)

The common mode gain represents the gain associated with a common mode change

in the input while the differential voltage is kept the same. As the common mode voltage is

decreased, the level 3 differential pair begins to press into the active current source below

it. Although the current should remain constant as the source’s collector moves and the

Early effect produces a slight slope in the response. (see Fig. A-3 on page158) This has the

effect of decreasing the current as the collector to emitter voltage is decreased. At some

point the source transistor begins to saturate and the collector current drops more rapidly.

With higher common mode voltages the level three transistors are pulled from the

active sources which cause the same current effect discussed above. Although the level 3

transistors are pressing into the level 2 transistors, there is little effect because the active

3

4

5

6

7

8

9

10

-400 -300 -200 -100 0 100 200 300 400

Control Voltage (mV)

Fre

quen

cy (

GH

z)

simulated(parasitics)

measured

simulated(parasitics)

measured

Cc = 0 pF

Cc = 76 pF

68

source is maintaining a constant current. With a gain of 5 GHz/V from Fig. 4-22, and a

common mode gain of 0.5 MHz/mV, the common mode rejection ratio, CMRR, is 20 dB.

Figure 4-23 FFI common mode responseThe common mode response of the FFI is quite flat with only a 1%deviation in frequency when the common mode is swept through ±100mV.

4.12.3. Response versus supply voltage (5 GHz VCO)

The frequency of the VCO continues to increase, with decreasing supply voltages

down to -4.3 V. This can be attributed to an increasing transistor fT as the collector current

increases. Below that voltage the transistors begin to experience high current effects and

the fT drops. At the peak frequency supply voltage of -4.3 V the collector current is

approximately 1.1 mA, which is higher than the 0.8 mA expected for fastest operation. The

power supply gain at the nominal -3.3 power supply is -600 kHz/mV.

-4.00

-3.00

-2.00

-1.00

0.00

1.00

2.00

3.00

4.00

-400 -300 -200 -100 0 100 200 300 400

Common Mode Control Voltage (mV)

Com

mon

Mod

e G

ain

(MH

z/m

V)

4.20

4.25

4.30

4.35

4.40

4.45

4.50

4.55

4.60

Fre

quen

cy (

GH

z)

Frequency

Common Mode Gain

69

Figure 4-24 FFI response versus supply voltageAt the nominal supply voltage of -3.4 V the center frequency is 4.6GHz. Lower voltages show a quick decrease in frequency, while highervoltages show an increase in frequency until -4.5 V. Above -4.5 V thefrequency drops quickly.

4.12.4. Phase noise measurements

Phase noise measurements, shown in Fig. 4-25, are very close to the ISF predictions

in Section 4.7.2. on page 56. At a 1 MHz offset from the carrier, the phase noise was

measured at -90 dBc/Hz and was calculated to be -93 dBc/Hz. The difference can best be

attributed to: testing effects, probe and wiring losses, and higher temperatures than

anticipated.

Because of the high noise testing environment a special differential input filter was

buil t to suppress signal noise on the differential input. The fil ter consisted of a differential

RC filter, with a very low bandwidth, and a non-electrolyte capacitor. In addition, because

supply noise was an important contributor to noise, batteries were used to supply power to

the chip.

3

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5

-6-5.5-5-4.5-4-3.5-3-2.5

Supp ly Voltage (V)

Cen

ter

Fre

quen

cy (

GH

z)

70

Figure 4-25 Open loop phase noise of FFI VCOThis plot shows the phase noise versus the carrier offset frequency. Thedata was collected using a LabView program in conjunction with aspectrum analyzer and special software supplied with the equipment.

4.12.5. Jitter measurements

The jitter relationship versus frequency plot is shown in Fig. 4-26. The data was

collected with an open loop VCO circuit using a HP 11801C sampling oscilloscope with

∆T set to 50 ns. The model described by (4-27) accurately described the end points of the

ji tter function but the results were off by as much as 20% in between. This can be attributed

to the fact that when the VCO operates more like a four stage oscillator it exhibits fast rise

times. During interpolation, however, the VCO favors a sine-wave output and the rise time

is reduced, increasing the jitter. As s is increased, and a two stage oscillator is approached,

the rise time is more representative of that indicated in the model. At the target operating

frequency of 5 GHz, κ is equal to 14.2, which is 36% lower than κ when operating as a

normal four stage oscill ator.

-130

-120

-110

-100

-90

-80

-70

-60

100 1000 10000 100000

Frequency (kHz)

Pha

se N

oise

(dB

c/H

z)

71

Figure 4-26 FFI VCO analytical and measured jitterThis plot shows how jitter is related to the frequency of oscillation. Thefact that the jit ter improves at higher frequencies is a result of the systemoperating with fewer stages.

10

12

14

16

18

20

22

3.0 3.5 4.0 4.5 5.0 5.5 6.0

Frequency (GHz)

(s)

analytical

measured

72

5Design of the Transmitter


The first transmitter was submitted to IBM for fabrication in February 1999 as a

stand-alone chip. It generated all 16 parallel data bits internally and had no mechanism to

accept externally supplied data. The bit rate specification of 20 Gb/s operating speed was

not achieved due to a VCO load imbalance.

The second prototype, submitted to Sierra Monolithics Inc. in April 2000, was a

unified transmitter-receiver chip. It contained improvements made to the first prototype

and was designed to be a fully working chip capable of being packaged or wafer tested. The

transmitter is this implementation easily hit the 20 Gb/s target data frequency.

An invention disclosure record for the symmetric multiplexer was submitted in

February, 2000. RPI has subsequently stated that they are going to pursue a U.S. patent for

this invention.

5.2. Top Level Architecture Overview

The goal of the transmitter is to accept low speed parallel data and multiplex it to high

speed serial data. In some cases, it must first encode the data by adding extra bits for error

correction, byte alignment, word framing, or channel synchronizing. The encoded data is

then multiplexed from n parallel bits to a single bit stream. An additional stage, driven by

a very low noise PLL, may then be used to retime the data [42] to remove accumulated

noise. Finally, an amplifier is used to drive the external channel that carries the signal.

This Serdes project did not investigate data encoding due to limited time and

resources. Although a full featured chip may include data encoding, a system of this type

can still operate without one. Presumably the role of the encoder would be off-loaded to the

next level of hardware or software.

Transmi tter

73

A 16-to-1 multiplexer was implemented as four 4-stage registers and one 4-1

multiplexer. The design revolved around a unique multiplexing scheme that required four

inputs and could run with a quarter frequency clock. The output data was clocked at 20

GHz, but the oscillator ran at 5 GHz. Since 16 external bits were to be supplied to the chip

and the multiplexing scheme required four bits, a front-end register that could be expanded

to meet a parallel data word of any width was designed.

Instead of adding an additional stage to perform symbol retiming, the retiming

function was pushed into the multiplexer. This necessitated a complete redesign of the

standard multiplexing CML gate, so that it could handle the stringent timing requirements

for transmission. The symmetric multiplexer evolved from this redesign process.

Like the retiming circuit, the channel ampli fier was also incorporated into the

multiplexer. This involved ramping up transistor sizes and making a change in the output

stage of the multiplexer.

Figure 5-1 Transmitter and multiplexer architectureThe top level transmitter design consists of a 16-1 multiplexer driven bya 5 GHz PLL. Four 4-stage shift registers capture 16 bits of data every800 ps. These then feed the 4-1 multiplexer in order to serialize the data.

5.3. 16-1 Multiplexer

Fig. 5-1 depicts the core of the transmitter, the multiplexer. It

is divided up into a 4 x 4 shift register bank and a 4-to-1 multiplexer,

also shown in the same figure. The 4-to-1 multiplexer captures 16

bits of data every 800 ps and serializes them to a stream of bits. The

width of each bit at 20 Gb/s is 50 ps.

116

VCO PLL

Transmitter

1.25

Gb

/s

20 G

b/s

16-1Mux

16-1 multiplexer

4-1

mu

ltip

lexe

r

shift reg

4

4

4

4

B

A

C

D

Transmitter

74

The shift registers consist of four cascaded MS-latches, each with a 2-to-1

multiplexer front-end. By selecting different inputs, the array of four latches can either load

external data, or accept data from the previous latch. Clocking the select line assures that

after 3 bits are shifted through the next “shift” , will result in a load. Each load pulse is

separated by 16 times the bit width or 800 ps. The tail bit of the register shifts in a zero

because new data overwrites it before it never makes it out of the head latch.

Figure 5-2 Data timing for the 4-1 multiplexerThe multiplexer interleaves the incoming data by using a multi-phase,quarter frequency clock. Timing of this circuit is critical because thiscircuit also has the responsibili ty to retime the data.

The unique nature of the multiplexer requires data in registers A and D to be offset

by 100 ps from data in registers B and C. This offset was accomplished by clocking the

registers with two in-quadrature phases of the PLL.

Each of the four registers is connected to the 4-to-1 multiplexer as an input. A special

“shuff ling” clocking scheme is used to multiplex the data. This alleviates the need for a 10

GHz clock that would typically be required to convert the final two 10 Gb/s signals into one

A

B

C

D

0o

90o

A

B

C

D

BA

CD

CBAD

0o

90o

a0

a0

a0

a1

a1

a1

a2

a2

a2

b0

b0

b0

b1

b1

b1

b2

b2

b2

b3

b3

c0 c1 c2

c1

c1

c0

c0 c2

c2

c3

c3

c3

d0

c3d0

d0

d1

d1

d1

d2

d2

d2

0ps 200ps 400ps

0

1

75

20 Gb/s signal. One single-frequency clock can control the shift registers and clock the

multiplexers.

Multiplexing is accomplished by offsetting registers A and D by 90° from registers

B and C (see Fig. 5-2). This creates the basic interleaving data sequences, BA, and CD,

which are synchronized with the first stage of 2-to-1 multiplexers. Interleaving was not

necessary to create the sequences, but without it, coincident edges and timing gli tches could

have been introduced.

Signals BA, and CD arrive at the final multiplexer in phase with each other. The

phase of the select signal of this multiplexer is shifted exactly 90° from the previous

multiplexer’s select signal. This effectively cuts both BA, and CD in half and combines

them to form a CBAD signal. Therefore, final output edges are created from two sources:

the final multiplexer select and the change of inputs during selection.

The phase difference between the 90° and 0° signal is criti cal in determining any

output transition offsets. Any mismatch between the phases directly correlates to a phase

offset between consecutive transitions in the bit stream. To guarantee a 90° phase

difference a delay which exactly matches the delay of the two 2-to-1 multiplexers is

introduced. The easiest way to do this involves using a matched multiplexer whose a input

is set to 0 and b input is set to 1. Although this technique consumes some power its use is

necessary to significantly reduce phase mismatch.

5.3.1. The Case for the Symmetric Multiplexer

The 2-to-1 multiplexer is the final non-ampli fying stage in most serial transmitter

circuits. It is, therefore of utmost importance to study and understand the performance of

this gate and how its performance affects the data stream.

A typical 2-to-1 CML multiplexer util izing levels 1 and 2 is shown in Fig. 5-3. Data

inputs a, and b are on level 1 and the select input, s, is on level 2. In a clocked circuit the

important performance parameter is the delay from the input transition to the output

transition. The largest delay is taken from all of the possible combination of inputs and

outputs. This parameter, in conjunction with other gate delays, ultimately determines the

maximum speed at which the circuit can be clocked.

76

The multiplexer performance metric, however, is very different when used in a

transmitter when the multiplexers perform the retiming. Delay through the gate is of

secondary importance, whereas the shape and aperture of the eye diagram is of criti cal

importance. Bit widths must remain consistent, and bit amplitudes must remain large

enough to be received when noise is present.

Figure 5-3 CML Two Level MultiplexerThe level difference between the inputs a, and b; and the select input s,produce a phase mismatch when a, b, and s, are ali gned by 90°°°°. . . .

The data and select signals arriving at the multiplexer are forced to a phase difference

of 90° by the VCO and overall circuit architecture. It is questionable whether an exact 90°

difference is appropriate for this gate because the inputs arrive on different levels. Is there

any inherent difference between their respective delays? Perhaps a better choice of phase

exists such that a more uniform output is generated? How does the difference in levels

affect the loading and driving from previous gates?

The circuit in Fig. 5-4 was designed and simulated in order to analyze and answer

these questions. Signals a and b are complements of each other and the select signal’s

phase, ∅∅∅∅, , , , is varied around 90o. Ideally, the average value of the output will coincide with

the median when ∅∅∅∅ is equal to zero. This condition corresponds to an output with a 50%

duty cycle, in which each bit is of equal width.

The results of the analysis are shown in Fig. 5-5, and indicate that a phase offset of

13.5°, 7.5 ps is needed to maintain a 50% duty cycle. This effect is a result of the data

existing on level 1 and the select lines being on level 2. For a select change to propagate to

the output it must travel through two levels of logic where a data change only needs to travel

a1a0

b0 b1

s0

z1

z0

Q1 Q2 Q3 Q4

Q5 Q6 s1

77

through one. There is also a loading difference between the two logic levels. The collectors

on level 1 see the pull -up resistors and the base of the proceeding gate. On level two the

collectors see two emitters from the level above.

Figure 5-4 Simulation Testing of CML 2:1 MultiplexerBy varying the select phase relative to the data phase and averaging theoutput signal over time, a measurement showing ideal select and dataphase offsets can be made.

A 50% duty cycle when the phase difference between data and select signals is 90°

is desired, since both are driven off the VCO. The multiplexer, however, requires a 103.5°

phase difference for symmetric output. A delay element could be introduced to the data

lines to add 7.5 ps, but a better solution was invented; the symmetric multiplexer.

The symmetric multiplexer accepts all inputs on the same level, has the same loading

per input, and ensures that any input (data or select) will propagate to the output in the same

amount of time. An implementation of the gate is shown in Fig. 5-6. The left hand side of

the multiplexer represents the OR condition a ·s + b ·s, which generates the high output,

and the right hand side represents the inverse condition (a + s) · (b + s), which generates the

low output. The four transistors, Q1-Q4, in the center, act as a shared differential ampli fier.

During all static conditions one branch will have a high and a low level transistor and the

other branch will have both transistors in an intermediate state. The branch with the high

level will carry all of the current and produce the z output.

0°

180°

90° + ∅

2:1MUX

load

average

0 ps 200 ps 400 ps

∅∅∅∅

a

b

s

a

b

s

z

78

Figure 5-5 Simulation Results for CML 2:1 MultiplexerThe crossing point, or 50% duty cycle point, occurs at 13.5°,7.5 ps. Thisshows an asymmetry between the select and data inputs.

Figure 5-6 CML Single Level Symmetric MultiplexerA novel implementation of a multiplexer with inputs all on level 1,identical loading per input, and completely symmetric response.

-1.3

-1.25

-1.2

-1.15

-1.1

-1.05

-1

-0.95

-0.9

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

Phase (degrees)

Ave

rage

Out

put V

olta

ge (

V)

Q1

Q2

Q3

Q4

I ½I ½I½I½I½I ½I

a0

b0

s0

z1z0

Input Stage Output Stage Input Stage

a1

b1

s1

79

Fig. 5-7 shows the state of each transistor based upon the input values. “H” represents

a high state, or the highest voltage and indicates which transistor will carry the current. The

Medium level falls halfway between the High and Low levels. To ensure proper noise

margins the voltage difference between the high and low levels is increased to 500 mV.

This places a 250 mV difference between the two top voltage levels.

Each of the transistors in the central tree of the multiplexer is driven by two

differential pairs. This allows for a reduction in the size of the 12 input transistors without

any loss of signal integrity, and also directly compensates for the doubled loading on each

input. A drawback is that each input requires a minimum of 2 µm of load, no matter the

output driving abilit y.

Power requirements for this circuit are also four times higher than those for a typical

level 1 output CML multiplexer. On the other hand, since this circuit only requires one level

of logic, the negative power supply can be reduced by at least 25%.

Figure 5-7 Symmetric multiplexer transistor statesThe states of transistors Q1-Q4 are defined to be high, low, and middle.The transistor in the high state carries the current and dictates the outputvalue.

5.3.2. Final Implementation and Simulation

Serdes I did not utili ze the symmetric multiplexer and had a 15% phase error in

alternating edges, shown in the simulation in Fig. 5-8. Figure (a) shows the eye diagram of

the standard CML multiplexer. The inputs were designed to exercise the circuit as much as

possible, i.e. using 50 ps input pulses, and differing a and b inputs when the select input

a b s Q1 Q2 Q3 Q4 Z

0 0 0 M M L H 0

0 0 1 M M H L 0

0 1 0 L H M M 1

0 1 1 M M H L 0

1 0 0 M M L H 0

1 0 1 H L M M 1

1 1 0 L H M M 1

1 1 1 H L M M 1

80

changes. At the center voltage of 125 mV, two distinct crossings can be seen, which result

from the input to output delay imbalance in the CML circuit. The time for a select transition

to reach the output is about 10 ps longer than for an a or b input to reach the output.

Figure (b) shows a much cleaner eye diagram for the symmetric multiplexer. The

reason for this improvement lies in the circuit architecture, which was designed with

symmetry to ensure that any input changes propagate to the output in the same amount of

time. The ramifications of this are obvious. The transmitter output will benefit from a clean,

low phase noise multiplexer signal.

The 4-to-1 multiplexer with symmetric architecture in Serdes II also plays the role of

the line driver by driving the pads directly. The reasoning behind this design feature was

removing the noise that would be introduced by an additional li ne driver. By integrating the

two components, the total phase noise is smaller. In order to accomplish this, larger 12 µm

transistors, capable of sinking 9.6 mA, were used in the final multiplexer. In addition, a

cascode ampli fier was added to the output stage to limit the loading on the differential pair.

Driving the final 12 µm output stage required ramping up of transistor sizes so that

the input stage of the final multiplexer was not loaded down. Starting with a 1 µm input

stage, two intermediate emitter followers were added of sizes 2 µm and 4 µm. This enabled

an output stage with 8 µm transistors, each capable of driving transistors of their own size

or larger. This output stage drives the final multiplexer which has an input of 4 µm. Once

again, two 6 µm and 8 µm emitter followers were added, followed by the 12 µm output

stage. This technique required a total current of 63 mA as compared to a 15.4 mA current

requirement for the standard CML multiplexer and the associated pad driver.

81

Figure 5-8 Multiplexer Eye DiagramsThese plots are output eye diagrams for the standard CML multiplexer(a), and the symmetric multiplexer (b). Both circuits received identical20 Gb/s inputs and identical loading.

Figure 5-9 Multiplexer Layout for Serdes I and IIThe transmitter 16-1 multiplexer consists of a 4x4 shift register and a 4-1 multiplexer. The layouts for Serdes I (a) and Serdes II (b) are shownhere.

-0.30

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0 20 40 60 80 100

Time (ps)

Out

put V

olta

ge (

V)

-0.30

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0 20 40 60 80 100

Time (ps)

Out

put V

olta

ge (

V)

(a) (b)

(a) (b)CML multiplexer 3 symmetric

multiplexers

4x4 registers 4x4 registers

82

5.4. Phased Locked Loop (Frequency Synthesizer)

When reducing phase noise in the transmitter becomes the most

important design factor, the transmitter phase locked loop, PLL,

becomes the most important circuit in the system. Its role is to

generate a high frequency, extremely low noise clock from a low

frequency, noisy, externally supplied reference clock. For the transmitter PLL in this

design, the external reference is at 625 MHz, and the PLL clock output is at 5 GHz.

The standard linear model of a PLL, shown in Fig. 5-10, has a phase detector (PD),

a loop filter (LF), and a VCO. The phase detector subtracts the phase of the input signal

from the phase of the output signal. This gives a measure of the phase offset of the two

signals and is the mechanism that allows the phases to be locked together. The loop filter

filters the output of the phase detector in order to meet certain feedback characteristics,

such as output noise, pull -in range1, and pull -in time2. The VCO acts as an integrator,

converting a control signal to an oscill ating signal represented as a phase. Finally, a 1/8

frequency divider is used to match the internal frequency to the external input frequency,

as required by the PD.

Figure 5-10 Linear model of PLLThe PLL used in the transmitter consists of three primary parts: phasedetector, loop filter, and VCO. An input filter is added to reduce thenoise levels of the input signal.

The transmitter’s frequency synthesizer went through three major revisions during its

evolution. These revisions are depicted in Fig. 5-11. During the rapid development of the

1. Pull -in range is the maximum range of frequencies for which the PLL can eventually acquire lock. ThisPLL parameter is primarily a function of the PD implementation, but is also determined by the frequencyrange of the VCO.

2. Pull-in time or acquisition time is the amount of time it takes the PLL to achieve lock from an initial fre-quency deviation that is within the pull-in range.

Transmit ter

phasedetector

Kd F(s) Ko/s

VCOloopfilter

θθθθi

θθθθo

frequencydivider

to transmitter

Y(s)

inputfil ter

v i

83

first transmitter prototype, a PLL was designed that had minimal functionality and poor

performance. The goal was to quickly develop a clock multiplier without concern for phase

noise and jitter performance.

With more time and results from Serdes I, a highly improved Serdes II PLL evolved.

It possessed a 3 state PD, which improved the lock-in range1 and acquisition time; an active

op-amp style LF, further improving key characteristics; and the FFI VCO which reduced

noise and increased performance was still mi ssing from this design. An optimized

bandwidth driven by previous results and specifications. Measuring data about the noise

characteristics of the VCO and gathering information about the noise spectrum on the input

noise source was key to bandwidth optimization.

Test data from the first two prototypes, better simulation techniques, and further

research yielded the final PLL design. VCO noise spectra allowed for a much better

bandwidth design, further minimizing PLL output phase noise. A smaller bandwidth

required frequency detection in the PD because of the much longer pull -in time. Another

improvement replaced the clumsy op-amp integrator with a high performance specialized

integrator which is also used in the receiver PLL.

1. The lock-in range, a function of the PD and the PLL bandwidth, is defined as the maximum frequencydeviation for which the PD wil l remain in lock, where the PD is in its linear range and does not slip.

84

Figure 5-11 Frequency synthesizer evolutionThe transmitter’s frequency synthesizer went through three majorevolutionary steps. The first had the most basic components andprovided minimal functionality. The second incorporated bettercomponents to minimize noise and improve the acquisition range andtime. The third, unfabricated version, added advanced PLL componentsand optimized key design variables based upon simulations andmeasurements from the other prototypes.

5.4.1. Input Filter

An effective technique in reducing PLL phase noise is to drive it with a very clean

reference source1. The PLL has the abilit y to lock a noisy VCO to a clean reference and

reduce the total output noise to a level below that of the VCO. With this in mind, an input

bandpass fil ter was designed and implemented in order to reduce the out-of-band noise of

1. The signal source used in the Frisc testing lab is very old and very noisy. In practice, a very well con-trolled low phase noise signal generator would be used as a reference and an input filter would not beneeded.

XOR PD type I passive LF(RC low pass fil ter)

CS SimpleVCO

Serdes I

3-statePD

type II active LF(op-amp filter)

FFI VCO

Serdes II

3-state PDwith frequency detector

type II active LFspeciali zed integrator optimized bandwidth

FFI VCO

Serdes III

input fil ter

85

the signal source. This technique was added to the Serdes II design but removed in the

subsequent design because a better input signal generator was acquired.

Figure 5-12 Schematic for input filterThe input filter is a bandpass filter centered around the referencefrequency. It is intended to filter output low and high frequency noiseassociated with this signal.

Fig. 5-12 depicts the schematic of the input filter, which consists of an input

attenuator and an active bandpass filter. The active component of the filter is simply a high-

gain two-stage buffer with level one and level two outputs. The first stage does not effect

the voltage gain of the amplif ier and has Darlington pair inputs to reduce the input current

by a factor of β. Twenty-five percent larger pull-up resistors were used to increase the total

gain to approximately 5. The input resistor tree attenuator compensates for the large total

gain of the bandpass filter by reducing the input amplitude by 78%.

The frequency transfer function for the input filter is shown in Fig. 5-13. The peak

was designed to be at precisely 625 MHz with a bandwidth large enough to account for

parameter mismatches and frequency adjusting.

Because the final effect of this filter on the output phase noise of the PLL was not

known, a multiplexer was added after this circuit so that it could be bypassed if necessary.

This opens up the ability to determine the filter’ s actual usefulness.

5.4.2. Phase Detector

A phase detector produces a signal that yields information about the difference

between the phases of its two inputs. Ideally it produces a perfectly linear response for all

bandpassfilter

attenuator

R1

R1

R2

R3

R3C1

C1C2

C2

CML amplifier

R1R2R3C1C2

800 ΩΩΩΩ224 ΩΩΩΩ2 kΩΩΩΩ500 fF500 fF

86

phase differences and has an arbitrary gain. For real circuits, however, we must settle for

non-linear responses that may have regions where the gain becomes negative, where the

function is periodic in π/2 or π rather than 2π, and where the gain varies across the range.

5.4.2.1. Phase detector (Serdes I)

Figure 5-13 Input filter frequency responseAt the reference frequency of 625 MHz the input filter achieves aslightly greater then unity gain. All other frequency are attenuated.

Two different phase detectors where investigated in Serdes I and Serdes II, the XOR,

or Gilbert Multiplier, and the 3-state, respectively. The schematics for the XOR PD, shown

in Fig. 5-14, consist of a single tree CML gate with emitter followers. At one extreme, the

inputs are in phase and the average value of the output is 0. When the inputs are 180o apart,

the other extreme, then the output is 1. For the 3-state detector the output is taken

differentially across its two internal signals VU, and VD. These signals’ rising edges, which

are outputs from the two resetable MS-latches, coincide with the rising edges of the input

signals, Vi, and Vo. The falli ng edges, on the other hand, are triggered together after both

have risen. This creates a wider pulse on the signal, VU, or VD, when the associated input

arrives first.

-60

-50

-40

-30

-20

-10

0

1 10 100 1000 10000

Frequency (MHz)

Gai

n (d

B)

87

The output of the XOR PD, shown in Fig. 5-15, has a linear response from -180o to

180o. Outside that range the output slope is negative and produces a temporarily unstable

PLL response before the phase detector output enters a positively sloped region again. The

gain is about 0.53 V/rad which is relatively high. It is set by the large input control range

of the VCO used in Serdes I, the Simple Current Starving version of the VCO.

Figure 5-14 Phase detector schematicsThe XOR detector (a) uses a XOR logic cell to perform phase detection.The 3-state detector (b) util izes two resetable MS latches and an andgate.

5.4.2.2. Phase detector (Serdes II)

Fig. 5-15 also shows the output of the 3-state PD. Its response is greatly improved

over that of the XOR PD. First, the slope is always positive and it extends across the entire

input phase difference range. This greatly improves the response of the PLL during lock

acquisition. This response wil l be discussed in Section 5.4.6. Another important

improvement appears when phase error is continuously increased above 180o, which is

common with larger frequency offsets. Although the plot shows that the output is -120 mV

above 180o, the output will step to 0 mV, and continue to rise beyond that phase. This effect

increases the pull -in range.

In order to implement the 3-state PD one significant hurdle related to the reset

feedback through the AND gate had to be resolved. Proper operation occurs when the

second output edge from the latches causes the AND to go high, reset both latches and bring

the AND low again. Through simulation, however, the very thin reset pulse was faili ng to

reset one of the latches. The problem was traced to the non-uniform loading of the output

latches and the asymmetry in the AND gate inputs. The solution was to use a single-ended

v i

vo

R

DQ

1

R

DQ

1

v i

vo

vd vd

vU

vD

(a) (b)

XOR Phase Detector 3-State Phase Detector

88

AND gate to provide symmetric loading, and matched input levels for both latches. This

ensured that both latches were uniformly reset, and alleviated all timing issues.

Figure 5-15 Simulated phase detector responsesPlotted above is the average of the signal output of the two phasedetectors. The XOR phase detector has a valid range between 0o and180o, and the 3 state detector output is valid for any phase difference.

These PDs are used in a frequency synthesizer which includes a divide-by-8

component. The nature of the PLL gain K, and the 3 dB bandwidth is such that they are

both reduced by a factor of N. This factor is incorporated into the PD gain which gives the

XOR PD an adjusted gain of 66.3 mV/rad and the 3-State PD an adjusted gain of 5.25

mV/rad.

The lock-in range of the PLL using the XOR PD is (π/2)K and πK for the 3-state PD.

The larger range of the 3-state PD provides higher resistance to cycle slips and yields a

shorter pull -in time when used with a frequency detector. The pull-in time of the XOR PD

is about four times larger then the 3-state PD with the same PLL bandwidth. The pull-in

range is also four times larger for the 3-state PD. The simulated figure of merit1, M, for the

1. The figure of merit, M, for a PD is Vdo/Kd, where Vdo is the mean value of the PD output and K d is gain.A low M value for a PD yields a small pull-in range.

-150

-100

-50

0

50

100

150

-270 -180 -90 0 90 180 270

Phase Difference (degrees)

3 S

tate

Pha

se D

etec

tor

Out

put (

mV

)

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

XO

R P

hase

Det

ecto

r O

utpu

t (V

)

3 state

XOR

89

XOR gate is quite high, approaching 1 million. This was expected, because of the very

simple nature of the XOR gate. The 3-state PD, on the other hand, has a value of about 22

which is appropriate for a circuit of this complexity.

5.4.2.3. Phase detector (Serdes III)

Research into Serdes III necessitated a decreased bandwidth in order to further

suppress spurious noise introduced by the PD. Side effects of a decrease are a reduction in

the pull-in range, and an increase in the pull-in time. A very effective way to counter these

negative effects is to add a frequency detector, FD, to the 3-state PD. This circuit is able to

detect cycle slips and provide a strong pull-in signal in response. A cycle slip occurs when

the phase error exceeds the bounds of the PD (0, 2π) and the output steps (See Fig. 5-15 on

page 88). This is indicative of a large frequency error and if the proper circuitry is added to

sense this event then a large change can be made to the loop fil ter integrator.

Figure 5-16 PLL frequency detectorA frequency detector detects cycle slips from the PD and performs largecontrol voltage changes. This allows a much wider pull-in range, andsmaller pull -in times.

vo

vU

vD

v i

slipdetector

slipdetector

vU’

vD’

3-state PD

loop fil ter

RQ

D

delay

D Q

X

Y

Y

X

X

Y vs

1

slip detector

vo

v i

vd

vd

cyc le slip

90

The schematic in Fig. 5-16 shows the implemented frequency detector that was added

to Serdes II’s design. The detector compares the input to the output of PD. When a cycle

slip occurs, an output edge normally created on vu by v i’s rising edge is missing, and this is

sensed by the slip detector. The detector will then add or remove a fixed amount of charge

from the charge pump integrator. This causes a step change on the output of the integrator.

The key to implementing the FD is to ensure that the induced frequency step, ∆ωc,

does not exceed twice the lock-in range, ωL which would force the frequency to oscill ate

around ωL and never acquiring lock. Typically ∆ωc is conservatively set to ωL so that pull -

in time is minimized and PLL lock is ensured.

5.4.3. The VCO

Serdes I utili zed the Simple CS VCO with a gain of approximately 0.5 GHz/V. Its

highly variable gain, and non-linear frequency response made analytical modeling of the

PLL difficult. The second and third prototypes used the FFI VCO which has a consistent

gain of 6 GHz/V. Its linear response made analytical modeling much easier to perform.

5.4.4. Loop Filter

The loop filter in a PLL plays a criti cal role in determining the PLL bandwidth.

Usually the gains of the PD and the VCO, are fixed and therefore the loop filter is the only

component available to control the bandwidth. A high bandwidth corresponds to a strong

abilit y to track the input phase at high frequencies. This would be very useful for a receiver

that needs to track an incoming signal plagued with transmitter and line noise. This abilit y

will be discussed further in the following chapter. A small PLL bandwidth, on the other

hand, ignores phase variations on the input and performs very slow tracking. This is the

necessary situation for a transmitter since it needs to generate a very clean VCO signal,

independent of the noise introduced by the input reference signal and from the VCO.

Reducing the bandwidth too much, however, prevents the PLL from tracking out the VCO

phase noise. An optimum bandwidth for minimum total output phase noise does exist and

should be determined.

91

5.4.4.1. Serdes I Loop Filter

The transmitter PLL in the first prototype utili zed a passive low pass filter1. The filter

is a two stage RC ladder, and has two poles, but for the purpose of analysis, the higher

frequency pole can be ignored, since it only helps to reduce spurious modulation2. The loop

type is considered a two pole loop: one pole in the loop filter and one pole in the VCO. The

poles are at 30 MHz (ωn) and 207 MHz, when the capacitance and resistance values are 2

pF and 1 kΩ, respectively. The decision was made to use two RC stages rather than one to

increase the high frequency signal rejection.

Figure 5-18 Tx PLL passive loop filterA second order low pass filter utilizing a two stage RC ladderconfiguration.

The resistor and capacitor component values were maximized, for low bandwidth as

discussed above, based primarily on the proper operation of the PLL and on layout

limitations (capacitors consume large amounts of area). Since the PD output is differential

in nature, symmetric loading requires a duplication of the RC ladder. the four capacitors

were therefore limited to about 2 pf because they take up a large amount of layout space.

Resistor sizes, on the other hand, were reasonably small but values larger than 1 kΩ

introduced considerable loading effects because this RC circuit had to drive the VCO aVref

control circuit.

1. The design time constraint for this critical Serdes I component was very limited, and effort was only putinto the PLL’s proper operation rather then optimization. In the end it worked well enough to drive the trans-mitter and allow collection of all desired data.

2. A common problem in frequency synthesizers is called spurious modulation and is a result of the nor-mally much higher frequency output of the PD. A result of the frequency divider, these lower frequency sig-nals are not adequately attenuated by the loop filter and are passed on the VCO as unwanted phase noise.

F(s)

log f

|F(jωωωω)| (dB)

ωωωωn

1111

C=2 pFR=1 kΩΩΩΩ

R R

CC

92

5.4.4.2. Serdes II Loop Filter

Further research and design allowed for a much improved loop filter to be used in

Serdes II . The first important enhancement was the move to an active rather than passive

filter. The use of an integrator allowed a loop filter dc gain, F(0), approaching infinity to

be used in contrast to a passive filter’s dc gain of unity. From this, the PD static phase error,

becomes approximately zero, when the PD offset voltage1, Vdo, is zero, where Kd is the gain

of the PD, and where Vco is the static control voltage2 of the VCO. Under these conditions

the input phase difference is kept near zero, when the PLL is in lock, which improves the

purity of the synthesized frequency [41] and aids acquisition.

Figure 5-19 Tx PLL active loop filterThis active loop fil ter incorporates a low pass front-end followed by anintegrator. The op-amp has a FET input stage to minimize loading, ahigh gain NPN stage and a low impedance output stage.

Resistors, R1, and R2, and capacitor, C, and the amplif ier in Fig. 5-19 form the core

of the filter. These elements form a integrator with a zero at

1. Vdo is the free running, or offset phase detector voltage. It represents the DC output voltage offset for thePD and is a property of the PD alone.

2. The static control voltage or V co, is the control voltage applied to the VCO which matches the input andoutput frequencies. It is related to the input signal and VCO properties.

θeo

Vdo–

Kd

-----------Vco

KdF 0( )------------------+= (5-1)

Gain StageNPN differential amplifier

Output Stagelow output impenitence

FET Front-Endhigh input impedance

low passfilter

integrator

op-amp

C

C

R2

R2C3

C3

R1/2 R1/2

ω21

R1C----------= (5-2)

93

and a gain of

at frequencies above ω2. This choice of 6.4 MHz for the loop bandwidth was based loosely

on comparisons with other similar loops which have bandwidths of approximately 1 MHz

[41]. These similar loops, however, utili ze a much cleaner LC VCO, so a larger bandwidth

was needed to compensate.

The final design of the loop filter yielded values for R1, R2, and C, equal to 16.7 kΩ,

6.67 kΩ, and 14.1 pF respectively. ω2 was 1.7 MHz, Kh was 0.4, and the total loop gain and

bandwidth was 6.4 MHz. In addition, the low frequency gain which is governed by the gain

of the amplifier is about 5.

Figure 5-20 Active loop filter transfer functionThe active loop has a 1.7 MHz zero which forces a high DC gain. A poleat 21 MHz attenuates high frequencies to reduce spurious modulation.

The addition of a low pass filter, or pole, to minimize spurious modulation, is realized

through element C3 in Fig. 5-19, with a cut-off frequency at ω3. The frequency of the pole

is at 21 MHz and yields a capacitor value of 1.8 pF.

Kh

R2R1-------= (5-3)

-100

-80

-60

-40

-20

0

20

Frequency (Hz)

Gai

n (d

B)

1kHz 1MHz 1GHz

ωωωω 2ωωωω 3

94

The frequency response of the open loop response is plotted in Fig. 5-20. A zero at

ω2 produces a -20 dB/dec slope which is not realized at low frequencies due to the non-

infinite gain of 13.5 dB of the op-amp. Above ω2, the gain is Kh until the pole at ω3 where

the curve drops off at -20 dB/dec. An additional pole at approximately 100 MHz exists

within the op-amp for loop stabilit y.

5.4.4.3. Serdes III Loop Filter

The implementation of the Serdes II I loop filter utili zes a negative impedance

ampli fier, NIA, charge pump [27]. Fig. 5-21 shows that the circuit has a RC fil ter which is

balanced or floated between a pull -up resistance and pull -down negative resistance. As

long as the sum of these resistances equates to zero then the filter nodes are allowed to float.

Any deviation from zero will result in a drift in the differential output voltage to infinity,

or to zero. To ensure a reasonable initial condition, the pull -up resistors should be slightly

smaller then the NIA resistance so that the differential voltage is slowly pulled toward zero.

The negative resistance is generated through a linearized CML feedback tree that is

very similar to the storage mechanism in a MS-latch. The current through one branch is

where Io is the total current through the tree, R is the value of the pull -up resistors, and v1

and v2 are the outputs and the nodes of the capacitor. Technically, the circuit acts as a

negative impedance

which is based upon a differential voltage and current. The end result is that the differential

voltage, v1-v0, is allowed to float at any value less than RIo. The resistance value of the NIA,

Rn, is the sum of the linearizing resistors and the emitter resistance, as described in

Appendix C.1.

ia

Io2----

v0 v1–

R----------------–= (5-4)

v0 v1–

i0 i1–------------------- Rn–= (5-5)

95

Figure 5-21 Receiver III integratorThe integrator used in Serdes III consists of a negative impedanceampli fier which essentially “ floats” a capacitor and current trees tomove charge on and off each end.

The striking benefit of this negative impedance charge pump is that it allows charge

to be removed from either end of the capacitor while the differential center voltage is

maintained. Removal of capacitor charge through a CML tree causes a differential voltage

change, and when a constant current is drawn, the voltage will ramp accordingly, thus

showing the integration.

There are two methods for affecting the differential output voltage; each method is

handled by its own circuit. The first is a standard current source which uses a linearized

CML tree with inputs int0, and int1 to draw current from either side of the filter. The

ampli fier gain, Ka, is approximately 1 mA/V. This value can be derived from the linearized

CML tree plot found in Fig. C-3 on page 165. The constant includes a factor of 1/2 because

the current is split between two paths, one directly through the pull -up resistor and one

through the filter.

The second method is a step input used in conjunction with the frequency detector in

the PD. In the case of a 3-state PD, a cycle slip detected by the FD will pulse one step input

i1

negative impedance amplifier

7x

int0 int1step0 step1ref

z0z1

Io

Rp

v1 v0

i0

RC1

C2

96

or the other and cause a large charge change on the capacitor. The size of the step current

source dictates the amount of change.

Serdes II I was the first design with a loop gain that was optimized for minimal output

phase noise based on measured and simulated phase spectra data from the FFI VCO

discussed in Section 4.12.4. on page69. With this information and phase noise data on the

reference source, the noise spectrum plot shown in Fig. 5-22 can be created. It shows the

voltage spectral density for the FFI VCO and for a very low noise reference source. The

frequency at the point of intersection indicates the ideal value for loop bandwidth. Values

lower than this allow more VCO noise to propagate to the output while values higher than

this allows more reference noise to propagate to the output and increases the spurious

modulation from the reference.

Figure 5-22 Voltage spectral density for optimal loop bandwidthShown above is the voltage spectral density of the VCO and thereference source. The point where they intersect is to first order theoptimal place to define the loop bandwidth.

The reference source to be used is quoted as having a noise spectral density of -140

dB at frequencies below 1 GHz. This must then be subtracted by the PLL multiplication

factor of 8, or the equivalent of 18 dBm. The VCO voltage spectral density was found

-140

-80

-100

-60

-40

VCO

reference source

1 MHz

100 kHz

10 kHz

1 kHz

10 MHz

100 MHz

ΦΦΦΦ (dBc/Hz)

optimum loop BW for minimum noise

-120

effective reference

18 dBm

20 dB/dec

97

through simulation, analytical and measurement results, and has a value of -90.2 dBc/Hz at

1 MHz.

The relatively high noise content of the VCO and the low noise content of the

reference source placed the optimal loop bandwidth, K, at 33 MHz. Suppressing spurious

modulation requires placing a pole at 4K, 132 MHz, far enough above K so that the PLL

response will not be affected. At a reference frequency of 625 MHz, this results in an a 13

dB suppression of spurious noise which by

is equivalent to data rms jit ter of 5 ps. The PD minimum duty cycle, δ, is approximately

0.03. σt is one tenth of a bit width, which is unacceptable. Clearly the suppression of

spurious modulation is criti cal in minimizing jitter. Instead of a loop bandwidth of 33 MHz,

a bandwidth of 6 MHz was used instead. This yields an rms jitter due to spurious

modulation of 0.14 ps, which is considerably lower.

With a K at 6 MHz, the PLL zero (ω2) is placed at K/4, or 954 KHz, to give a 13%

response overshoot, and the pole (ω3) at 4K, or 24 MHz. For a VCO gain Ko of 34.5

Grad/s/V, a PD gain, Kd, of 5.25 mV/rad, a loop filter gain, Kt, of 1 mA/V, the high

frequency gain Kh must be set to 208 for K = KoKdKtKh = 2π(6 MHz). Solving for the loop

components from

yields C1 = 802 pF, C2 = 53 pF, and R = 208 Ω.

σt 50ps( )π4--- πδNK2

fr2

------ = (5-6)

F s( ) Kh

s ω2+

s 1s

ω3

------+

------------------------=

Kh

C1R

C1 C2+-------------------=

ω21

C1R----------=

ω3

C1C2R

C1 C2+-------------------

=

(5-7)

(5-8)

(5-9)

(5-10)

98

The size of the stepping transistors can be found using

where C is the capacitor size, ωL is the lock in range (πK = 18.8 MHz), K d is the PD gain

(34.5 Grad/s/V), and fc is the reference frequency (625 MHz). For this implementation the

calculated current is 3.4 mA, corresponding to a transistor size of 4 µm. The ref input is

used in conjunction with the step inputs and allows them to be driven single ended to save

power.

5.4.5. PLL Loop Response

The value of the PLL gain, K, is directly related to the 3dB point, and its design is

based on two factors: the VCO noise response and the input noise level. Small values of K

yield strong input noise immunity, as the PLL is very slow to respond to input deviations,

but transmits all of the low frequency VCO noise to the output. A small bandwidth is also

effective at reducing spurious modulation. A large value of K, on the other hand, allows the

PLL to track the input very closely and attenuate a considerable portion of the low

frequency VCO noise, but means that any input noise is passed on to the output. K, as a

frequency, also has a direct proportional effect on the pull -in range, and an inverse

relationship with the pull -in time. Put simply, a larger K allows the PLL to lock in more

quickly over a larger frequency range.

The process of choosing K is affected by the output noise specifications for the PLL,

but no noise specifications were given for the design of this PLL, as it was meant for short-

haul communications, where noise does not play a crucial role. So instead, K was chosen

small enough to limit the effects of the input noise, but not to adversely effect the layout

with large component sizes. Ensuring proper operation was also important, so design limi ts

were not pushed and instead a “center road” approach was taken.

The step response for the passive loop of Serdes I and the active loop of Serdes II is

shown in Fig. 5-23. Both responses show a very clean, non-oscill atory response which

represents adequate choices for pole locations. Serdes II has a longer settling time due to

I2CωL

Ko

--------------fc≤ (5-11)

99

the larger bandwidth and does not undershoot. From [41] the damping factor, ζ, is

calculated to be 0.47, and 0.65 for the PLL in Serdes I and Serdes II , respectively.

Figure 5-23 PLL simulated step responsesThe above plots, simulated in MATLAB, show the step responses forboth PLLs in Serdes I and II . The longer settli ng time of PLL 2corresponds to the smaller bandwidth. PLL 3 has nearly the sameresponse as PLL 2.

PLL phase noise in this case is realized as output phase noise of the transmitter. For

this reason, no direct PLL phase noise can be measured. Section 5.10. details the noise

results for the two transmitter designs. No simulation of phase noise in the PLL was done

for this particular design.

5.4.6. Lock Acquisition

Lock acquisition can be described by two factors: the pull -in time, Tp, and the pull -

in range, ωp. The pull-in time represents the maximum amount of time the PLL takes to

acquire lock and track the input phase when started out of lock. The pull -in range is the

largest frequency error for which the PLL wil l acquire lock. Both items are important

metrics in describing the usefulness of the PLL, and ideally Tp will be zero, and ωp will

cover the entire frequency range of the VCO.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 20 40 60 80 100 120 140 160 180 200Time(ns)

PLL

Ste

p O

utpu

t (ra

d)

Serdes I

Serdes II

Step Inpu t

Serdes I

Serdes II

100

Figure 5-24 PLL I simulated acquisition plotsThe above plots show the PLL in Serdes I during simulated acquisitionwhich is ideal and not equivalent to real li fe. This is also known as thejellyfish plot.

5.4.6.1. Serdes I Simulated Acquisition

Since Serdes I used a passive loop filter, the pull -in range is restricted by and equal

to the frequency of the dominant pole ω3 at 30.3 MHz. This is a result of the -π/2 angle shift

introduced by the pole, which effectively nulls the pull -in voltage. If, for example, a -π

angle shift was introduced then the PD output would be inverted, push-out would occur,

and the PLL would move further away from lock. The pull-in time is a complicated

parameter to derive; an expression and its derivation is presented on pages 186-187 of [41].

A rough approximation for pull -in time from simulation is 100 ns.

5.4.6.2. Serdes II Simulated Acquisition

Serdes II’s PLL simulated response is shown in Fig. 5-25. The pull -in time is about

four times that of Serdes I due to the smaller loop bandwidth and different phase detector

characteristics. With similar loop bandwidths and similar loop filters, the pull -in time for a

-1.7

-1.6

-1.5

-1.4

-1.3

-1.2

-1.1

0 20 40 60 80 100 120 140 160

Time (ns)

Con

trol

Vol

tage

(V

)

660 MHz

670 MHz

680 MHz

690 MHz

700 MHz

710 MHz

720 MHz

730 MHz

101

PLL with a 3-state PD versus an XOR PD is about 4 times smaller, and the pull- in range is

about 4 times larger. This is primarily due to the negative slope that exists in the XOR

response but not in the 3-state response, as shown in Fig .5-15 on pa ge88.

Figure 5-25 PLL II simulated acquisition plotsThe above plots shows PLL II during simulated acquisition which isfairly representative of actual acquisition, however Spice has anadvantage in setting initial conditions which can show a better responsethan in real l ife. Here is the squid plot.

The simulated pull- in time for the Serdes II implementation is about 400 ns, and the

pull -in range is approximately 75% of the full range of the VCO (600 to 900 MHz). The

addition of the 3-state PD has greatly enhanced the pull- in range at the expense of pull-in

time. This is a very favorable trade-off since typical pull -in time specifications are on the

order of µ-seconds.

5.4.6.3. Serdes III Simulated Acquisition

The third prototype has characteristics very similar to the second prototype, including

similar parameters such as: loop bandwidth, pole and zero locations, phase detectors,

VCOs, and gains. Acquisition plots are, therefore, nearly identical to those shown in Fig.

5-25. See Section 5.4.6.2. for pull -in times, and pull -in ranges. The FLL used in this

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

0.25

0 100 200 300 400 500

Time (ns)

Loop

Filt

er O

utpu

t (V

)

600 MHz

650 MHz

700 MHz

750 MHz

800 MHz

850 MHz 900 MHz

102

implementation does not have a considerable effect, but it does reduce the pull-in time by

about 10%.

5.4.7. 20 / 40 Gb/s Implementation

One area that was pursued in the development of the second prototype was an ability

to run the transmitter at either 20 or 40 Gb/s. Adding a second higher speed VCO,

multiplexers on the outputs, and an additional multiplexed divide-by-two circuit was rather

straightforward, as shown in Fig. 5-26. The primary difficulty arose when designing the

loop bandwidth to be appropriate for both VCOs. In the 5 GHz mode, the detector gain is

Kd/8 and in the 10 GHz mode it is Kd/16. This requires a reduction in half of the loop pole

frequency so that stable operation is guaranteed for both situations. This reduction has

negative implications on the pull -in time, because pull-in time has a inverse relationship to

the pole frequency. Halving the frequency doubles of the pull-in time.

Figure 5-26 5/10 GHz PLL implementationCreating a 5 and 10 GHz PLL involved the addition of a 10 GHz VCOand various multiplexers to select the correct phases and the properdivision circuit.

5.5. Clock Distribution

Clock distribution in the transmitter involves delivering the

PLL signal outputs, to the shift registers, to the external circuitry for

data loading, and to the multiplexers, with maximum phase

alignment. All prototype transmitters utili zed the same scheme for

clocking.

4 phases

3-state PDloop fil ter

5 GHzVCO

10 GHzVCO

divide-by-2 divide-by-8

625 MHz reference

T ransmitter

103

A chain of buffers delays, whose inputs are the PLL 0o and PLL 90o signals from the

PLL, constitutes the majority of the clock distribution system (see Fig. 5-27). It ensures that

data and clock travel in the same direction and that delays in the shift registers, buffers, and

multiplexers are matched to delays in the delay chain.

The most critical path in the clock distribution circuitry is found between the PLL and

the 4-to-1 multiplexer. Here the PLL 0o and the PLL 90o signals must stay phase matched

to ensure alignment of bit edges on the output. Offsets in these signals directly translate to

phase jitter and more diff icult signal reception. To ensure alignment, the delay chain was

designed to be symmetrically loaded, of minimal length, and perfectly balanced. Because

the 4-to-1 multiplexer was designed as a two stage multiplexer, and because of the critical

timing required by its architecture, a precise delay of one multiplexer was added to the 90o

line, guaranteeing perfect clock alignment at the multiplexers. Consequently the SEL 0o

and SEL 90o signals are offset by exactly one multiplexer gate delay.

The next most important timing event is the clocking of the four shift registers. The

90o branch of the delay chain and its inversion handles all four registers. Since loading from

the 8 latches (4 MS latches) was a concern, a driver buffer was added to the front of each

register. This forced the addition of an equivalent delay into the delay chain. The total

number of gate delays difference between the CLK AD input and the SEL 0o signal was

designed to be zero, to ensure maximum noise margin. The timing diagram, Fig. 5-28,

clearly depicts the precise relationship between the signals.

Loading the 16 bits of parallel data requires a clock edge every 800 ps (50 ps x 16

bits), a time four times slower than the PLL period, thus necessitating a load counter,

depicted in Fig. 5-29, which is essentially a frequency divider. Not only does the load

counter have to divide by four, it also has to create two load signals separated by 100 ps

because of the clock offset on registers A and D versus B and C. The load signals select the

multiplexer input on each bit to its load mode rather than shift mode. When the next rising

clock edge arrives data is latched into the register.

The final aspect of clock distribution is the generation of the signal that informs the

external circuitry that it is ready for new parallel data. The straight forward solution is to

use the LOAD AD signal. This guarantees that when both loads have completed, the data

has had a maximum amount of time to settle.

104

Although the use of a delay chain makes clock distribution straightforward and very

reliable, it does have one serious drawback. Since it lies between the PLL and the output

multiplexer, it contributes to the overall phase noise and jitter of the circuit. This noise is a

result of shot noise, thermal noise in the chain of buffers, fabrication mismatches between

the 0o and 90o phase lines, and coupling between the lines and substrate. Minimizing these

noise effects involved designing a symmetric and tight layout of the delay chain.

Figure 5-27 Clocking scheme for transmitterThe top level schematic for the transmitter clocking circuitry includesthe PLL as the clock generator, a delay chain for distribution, theregisters, and the 4-1 multiplexer.

load counter

delay chain

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

DQ

S

4

4

4

4

PLL

90o

exte

rnal

ly s

up

plie

d p

aral

lel d

ata

LOAD ADCLK AD

SE

L 0

o

SE

L 9

0o

A

B

C

D

BA

CD

0o

SO

LOAD CLK

105

Figure 5-28 Transmitter clock timingThe timing of the transmitter revolves around the delay chain whichensures that the data and the clock flow in the same direction. Thebottom three signals clearly show how the 4-1 multiplexer interleaves toproduce the output.

Figure 5-29 Load counterThe load counter divides the PLL signal by four and generates two 200ps load pulses offset by 100 ps from each other.

PLL 0o

PLL 90o

LOAD AD

CLK AD

SEL 0o

A,D

B,C

pulse every 4th CLK 0o edge

BA, CD

3 gates

3 gates

SEL 90o

SO

0 200 400 600 800 time (ns)

QD

QD

QD

QDLOAD CLK

LOAD BC

LOAD AD

LOAD CLK

LOAD BC

LOAD AD

800 ps 200 ps100 ps

106

5.6. Data Encoding

Data encoding is a general term for such techniques as:

encryption, compression, improved transition density, error

detection, channel alignment, byte alignment, DC voltage

balance, simpli fied clock recovery, and frame detection.

Typically, improved transition density and channel alignment are

performed on-chip although all could potentially be performed

off-chip. No encoding was performed in either Serdes I or Serdes II. See Section 5.11.1. on

page 118, for a brief study and recommendation of the 8B/10B encoding scheme.

5.7. Line Driver

The purpose of the line driver is to ampli fy the transmitter

signal, and drive the 50 Ω output line. Depending on the

specifications, this can either be a single-ended or differential

circuit [48], [36], [37]. At these speeds differential is usually the

optimum choice. The bandwidth of the circuit must be large

enough so that is will not attenuate the high frequency

components and close the signal eye. Noise is also an issue since

any phase noise introduced by the line driver will be directly realized on the output.

The line driver in the Serdes I circuit utili zed a simple pad driver circuit which was

not optimized for this purpose. In Serdes II , however, the line driver was integrated into the

final output multiplexer which limited the introduction of noise. The output voltage swing

was designed to be 400 mV.

5.8. Internal Testing Circuitry

5.8.1. Serdes I

Serdes I was designed without the abilit y to accept external

parallel data. Instead, the data was generated pseudo-randomly on

chip, through a 16 bit linear feedback shift register (LFSR).

Transmi tter

Transmi tter

T ransmitter

107

Designing a true maximal length 16 bit LFSR would create a sequence 65,535 bits

long, and because 16 bits are transmitted then followed by a single shift and repeated, the

serialized length is greater then 1 milli on bits. This was determined to be too long for the

simple reason that it would be very diff icult to determine whether the transmitter was

working correctly, during testing. An oscill oscope can only capture so much information

and it would be nearly impossible to find the exact position within the sequence.

Instead, a four bit maximal length LFSR followed by a 12 bit shift register was

implemented. The circuit shown in Fig. 5-30, has 16 MS-latches clocked through a buffer

tree, an XNOR gate for feedback, and an AND gate to create a synchronizing signal. The

synchronizing signal, SYNC senses all zeros in the LFSR and was placed on an output pad

in order to detect the start of the sequence. The ZBIT is the final bit of the generator and

was also placed on a pad to analyze the operation of the circuit. A 4 input AND gate, not

shown in the figure, determines if the LFSR contains all ones and if so inverts the output of

the XNOR to force proper oscill ation.

Figure 5-30 Serdes I LFSRA 16 bit, on-chip pseudo-random pattern generator consists of a 4 bitLFSR and a 12 bit shift register. The circuit used in the transmitter iscapable of generating a 240 bit serial stream.

5.8.2. Serdes II

Off-chip testing of this serial communication system required testing equipment that

operates at the bandwidth of the transmitter and receiver. At the rates being designed for no

such equipment exists and comprehensive testing must be done on-chip. The testing

scheme that was implemented feeds the transmitter serial output directly to the receiver and

the parallel data received back into the transmitter as shown in Fig. 5-31 [43]. A single bit

offset between the receiver outputs and the transmitter inputs allows data input on Tx pin

0 to travel through the loop 16 times, and then output on pin 15 of the Rx. By generating a

SYNC

0 1 2 3 4 5 15ZBIT

4 bit LFSR4 bit LFSR 12 bit shift register

CLOCK

000011101100101010000111011001010100001110110010101000011101100101010000111011000010100001110110100101000011101111001010000111010110010100001110101100101000011111011001010000111110110010100001011101100101000000111011001010000001110110010100

108

pseudo random sequence (see Fig. 5-30) at the input and verifying that sequence at the

output, the bit error rate (BER) can be measured. The verifying circuit generates a pulse

every time a good sequence is measured. A missing pulse indicates a bit error. A divider

was added at the output so that high BER measurements could be made without high

bandwidth test equipment.

With a 12 bit maximal length LFSR, a 4095 bit sequence can be generated. Since the

total sequence must traverse the loop 16 times, a minimum BER of 10-5 can be detected

with this method. The maximum time is determined by the time length of the test.

Figure 5-31 True error rate detectorThe TERD operates by feeding the transmitter output back into thereceiver and feeding the deserialized data back into the transmitter. Aone bit offset with an LFSR and verifier determines the BER.

The TERD requires proper channel alignment, which is accomplished through data

encoding and decoding. Since these circuits were not included in the second prototype, the

bit pattern generator was configured to feed directly into the transmitter through the pin

mapping shown in the top of Fig. 5-32 Various bits had to be duplicated, but after inversion

and separation the data is stil l suff iciently random.

bit pattern generator

bit pattern verification

TxRx 012345

8679

101112131415

012345

8

67

9101112131415

LFSR

resetgood pattern

Rx

bit

15

bit pattern verificationtransceiver

109

Figure 5-32 Serdes II bit pattern generatorA 12 stage LFSR with feedback to three stages yields a maximal lengthLFSR. A reset line was needed for use in the bit pattern verifyingcircuit.

5.9. Implementation and Fabrication

5.9.1. Serdes I

A -4.5 V power supply was chosen for this chip. This left

plenty of room for the three levels of logic and the active current

sources. Power minimization was not a design goal so this voltage

was not optimized. Fig. 5-33 shows the artwork and fabricated

pictures of the first transmitter design, and Table 5-1 shows the pad connections.

The chip has two inputs: the 625 MHz reference clock and a full/half rate frequency

selector. Three outputs were included to diagnose problems with the PLL and delay chain.

Two pads output the LFSR sequence and another pad outputs when the LFSR is reset.

5.9.2. Serdes II

The goal for the second Serdes chip was to correct problems from the first iteration,

combine the transmitter and receiver into one chip, and make the chip packagable.

Correcting the problems involved redesign of the VCO, and PLLs to meet the 20 Gb/s

specification. Combining the two systems allowed the development of an on-chip testing

circuit (TERD), which could perform full feedback testing. A drawback was that fewer

probe pads were available in the larger chip. Designing for packagability i nvolved the use

0 1 2 3 4 5 6 7 8 9 10 11

CL

OC

K

4 5 6 70 1 2 3 8 9 10 11

5 0 10 67 3 11 13 2 4 15 189 12 14

Tx input pins

LFSR output pins

reset

T ransmitter

110

of an array of C4 pads for flip-chip packaging. Pad drivers and receivers were developed

to accept and drive the 16 bits of parallel input and output data.

The east half of the chip was comprised of the transmitter as shown in Fig. 5-34. High

frequency probe pads T4, and T5 were used for the differential serial out signals. The 625

MHz reference input pad, T8, and the PLL clock output pad, T9, were required for testing.

An on chip LFSR, which was part of the test system could be selected through a DC pad,

C8, to drive the transmitter. Bit 3 of the LFSR was routed to output pad T1 to verify the

proper functioning of the test system. The transmitter utili zed two VCOs, which could be

multiplexed through pad, C11, into the clock synthesizer PLL. A selectable divide-by-2,

circuit driven by pad C10, was added to the output of the PLL for half frequency operation

of the transmitter. An input filter to help suppress high frequency phase noise from the

reference could be activated by pad C9.

Table 5-1 Pin-out of Serdes I transmitter

Pin I/O Description

S0 not used

S1 RF input reference clock (625 MHz)

S2 DC input frequency select (20 Gb/s or 10 Gb/s)

S3 RF output PLL output (5 GHz)

S4 RF output delay chain output (/8) (625 MHz)

S5 RF output delay chain output (5 GHz)

S6 not used

S7 RF output LFSR: sequence reset pulse

S8 RF output LFSR: sequence

S9 RF output transmitter out

S10 not used

S11 not used

111

Figure 5-33 Serdes I transmitter layout and photographOn the left is the final artwork for the first transmitter design. On theright is a microphotograph of the fabricated part.

The receiver located on the west side of the chip, accepts differential serial data on

the two high frequency pads R4, and R5. The recovered clock, important for lock

verification, was routed to a pad R8. By using pads C3, and C4, four different

demultiplexed bits could be analyzed on pad R9 for proper operation. The test source built

into the receiver was controlled through C1 and C2, enabling three different test patterns.

The true error rate detector circuit pulsed pad R0 when a bad packet was seen and toggles

R1 when a good packet was detected.

In order to reduce chip power, the circuits were optimized around a supply voltage of

-3.3 V. This represents a 25% power savings when compared to the Serdes I -4.5 V supply.

LFSR

S0

S1

S2

S3

S4

S5

S6

S8

S9

S10

S11

S7

artwork fabricated chip

mux

driver

del

ay c

hai

n

PL

Lte

st

112

Table 5-2 Bondpad pin-out of Serdes II chip

Pin I/O Description Pin I/O Description

R0 RF out TERD: bad packet seen T0 RF out duplicated data into Rx

R1 RF out TERD: toggle every full packet T1 RF out LFSR: bit 3 into Tx

R2 Power Vee (-3.3V) T2 Power Vee (-3.3V)

R3 Power Gnd T3 Power Gnd

R4 RF in differential serial in T4 RF out differential serial out

R5 RF in differential serial in T5 RF out differential serial out

R6 Power Gnd T6 Power Gnd

R7 Power Vee (-3.3V) T7 Power Vee (-3.3V)

R8 RF out receiver clock T8 RF in ref clock (625 MHz)

R9 RF out selected demuxed data T9 RF out PLL out (divided by 8)

C0 DC in Rx test source control voltage C6 Power Vee (-3.3V)

C1 DC in Rx test source select A C7 not used

C2 DC in Rx test source select B C8 DC in select Tx input source

C3 DC in TERD: select A test bit C9 DC in enable Tx input filter

C4 DC in TERD: select B test bit C10 DC in enable TX PLL divide-2

C5 Power Gnd C11 DC in select VCO (5/10 GHz)

113

Figure 5-34 Serdes II chip layout and microphotographShown here is the full Serdes II chip including a microphotograph in thebottom left corner. The testing pads are located around the perimeter.

5.10.Testing Results

5.10.1. Serdes I (transmitter test results)

An output waveform captured directly from the oscill oscope is shown in Fig. 5-35(a).

It shows the bit pattern expected from the on-chip LFSR testing circuitry. The abilit y of the

PLL to achieve lock was very poor and a narrow pull -in range of 420 MHz to 460 MHz was

measured. The hold-in range was larger, from 393 MHz to 490 MHz: equivalent to a data

T0T1

T4T5

T8T9

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C11

R0R1

R4R5

R8R9

R3

R6R7

R2 T2T3

T6T7

TxRx

16 bit input data

16 bit output data

114

bit rate of 12.6 Gb/s to 15.7 Gb/s. At a bit rate of 15.3 Gb/s, the rms phase jit ter was

measured1 to be 6.3 ps, or about 10% of the bit width.

Figure 5-35 Transmitter waveform (Serdes I)(a) The output waveform of the transmitter running at 15 Gb/s with a350 mVp-p swing. The pseudo-random pattern matches the expectedpattern from simulations. (b) An eye diagram at 15 Gb/s showing therelatively large phase noise and its effects on the closing of the eye.

Although the transmitter was designed to operate at 20 Gb/s it performed 25% worse,

15 Gb/s, which can be attributed to two important factors. The first was a result of the VCO

loading environment, which ideally consists of equal loading with four minimum sized

buffers. It was instead loaded with two buffers on one stage, one on each of two others, and

none on the fourth2. The effect was a reduction in speed probably due to the double load on

one stage, and a non-quadrature phase mismatch between stages. The second factor was a

result of simulations that did not adequately compensate for interconnect parasitics.

Resistive and capacitive effects at these frequencies can have a profound effect on the

1. Performing a true phase noise, and jitter measurement requires a spectrum analyzer capable of an abso-lute reading. A time domain oscilloscope, such as the one used to collect this data, merely measures the jitterbetween the signal and the trigger. If the trigger signal is correlated in time to the measurement signal thenthe jitter measurement can be quite a bit less than the absolute jit ter.

2. This was an oversight and was definitely not intended. The receiver which was designed a few weeksafter this had ideal loading characteristics. This improved its response and left the transmitter and receiverwith two non-overlapping frequency ranges.

(a) (b)

115

overall speed of the chip. Lack of time and understanding for these simulations produced

slower than expected results.

Both of the issues discussed were addressed and solved in Serdes II. The loads on the

transmitter and receiver VCOs were carefully checked to make sure loading was balanced

and minimal. Interconnect simulations produced better designs in critical circuits such as

the VCO and PLL. A wide margin was introduced in the design of the VCO to account for

unknown effects.

5.10.2. Serdes II (transmitter test results)

The Serdes 2 design was successful in attaining the 20 Gb/s target bit rate. The

relevant eye diagram is shown in Fig. 5-36. The output voltage swing is 350 mV and the

eye is 30 ps wide and 200 mV high. This represents a big improvement from the original

design, which failed to meet the specifications. The eye diagram is also much cleaner and

symmetric with less total rms jit ter.

Figure 5-36 Serdes 2 transmitter eye diagramShown here is an eye diagram at the target 20 Gb/s. It shows an opening30 ps wide and 200 mV high.

116

The PLL has a wide pull -in range from 3.6 to 5.3 GHz (14.27 to 21.58 Gb/s), which

is more than 75% of the total frequency range of the FFI VCO. The hold-in range is

identical to the pull -in range, indicating a well balanced and nearly optimal PD. When using

the higher speed VCO the pull -in range changed to 5.4 to 7.6 GHz, yielding an upper data

rate of 30 Gb/s.

Jitter measures the accumulation of transition offsets over a given length of time. For

an open loop, without a PLL, a clock will have exponentially increasing jitter with respect

to time. When placed in a PLL, the jitter levels off and becomes constant after one

bandwidth time constant. For the Serdes 2 PLL, the jit ter was measured with the time

domain oscill oscope at 4.3 ps with the reference signal and 2.9 ps without. This indicates

that considerable jitter was being introduced by the signal source.

Fig. 5-37 shows the phase noise spectra of the open loop VCO, the open loop

reference, the open loop reference plus 18 dB and the closed loop PLL. The reference plus

18 dB is the effective phase noise seen at the input to the PLL. The PLL closed loop phase

noise behaved as expected. First, at low frequencies the phase noise approached that of the

reference. This phase noise was expected since this was well below the loop bandwidth of

6.2 MHz and the PLL is able to track out the VCO leaving just the reference noise on the

output. The difference between the PLL and reference phase noise is li kely from noise

introduced in the loop filter. Close to the loop bandwidth of 6.2 MHz, the sum of both the

reference and VCO noise contributed to the total noise. And above the loop bandwidth, the

phase noise should follow closer to the VCO phase noise and that is what was seen.

A more accurate way to measure jit ter is in the frequency domain. This enables the

removal of the in-band low frequency jitter, which is easily removed by the receiver PLL,

from the rms jitter measurement. Integrating the PLL phase noise plot from 100 kHz to 100

MHz gives an rms jitter of 1.4 ps. This value is lower than the 4.3 ps found with the time

domain oscilloscope, which indicates that a larger amount of low frequency jitter can be

found in the reference signal.

The preliminary specification for OC-192 SONET indicates that the maximum

acceptable jitter must be less then 0.09 UI (Unit Interval) for 1012 bits. Finding the

associated rms jitter involves integrating the Gaussian probabilit y density function (pdf)

117

from x to infinity and setting the result equal to the bit error rate of 10-12. The value of x is

about 7.5 standard deviations, yielding a rms jitter specif ication of 1.2 ps at 10 Gb/s.

Although the transmitter jitter of approximately 1.4 ps is larger than the SONET

specification of 1.2 ps, this circuit was not designed with SONET in mind. For short-haul

communications higher jit ter is more acceptable.

Figure 5-37 Tx PLL measured phase noise spectraThe PLL closed loop behaved as expected with the PLL tracking out theVCO noise at low frequency and following the VCO noise at highfrequency.

5.11.Future Design

The extremely large scope of this project left a number of areas of research untouched

and undeveloped in the first two fabricated designs and the third simulated design. The

basic elements of the transmitter were designed with optimizations and research performed

only in specific areas. The remainder of this section describes key areas that are

recommended for future effort in order to establish these designs as highly functional,

useful, production-worthy designs.

-140

-130

-120

-110

-100

-90

-80

-70

-60

0.1 1 10 100

Frequency (MHz)

Pha

se N

oise

(dB

c/H

z)

VCO open loop

PLL closed loop

ref - 18 dB

reference open loop

118

5.11.1. 8B/10B Encoding

8B/10B encoding solves such issues as transition density imbalance, error detection,

command insertion, and DC balancing [26], [35]. It does so by adding an additional two

bits of additional information for every eight bit input and requires a 25% increase in speed

for the same information throughput.

The frequency of transitions in the data is a very important factor in the design of the

receiver. In general, the more transitions provided to the receiver, the better the PLL’s

abilit y to lock into the serial stream. 8B/10B encoding guarantees a maximum run length

of five bits, and a lowest transition density of 30 transitions per 100 bits. Defining a

minimum density makes it easier to model the data stream arriving at the receiver.

Another feature of the encoded stream is an equal number of ones and zeros. This

allows all single bit errors to be detected. In addition, because of the much larger 10 bit

word space, the decoder can detect undefined words and flag them as errors.

The DC balance is the average of the number of ones and the number of zeros. For

high speed optical li nks, it is very desirable to have a DC balance of 0.5, which corresponds

to an equal numbers of ones and zeros. This stabili zes effects, such as heating in the optical

circuits, which can be a function of the sign of bits being sent. 8B/10B guarantees a DC

balance of 0.5 because it forces equal number of ones and zeros per character.

Since data encoding occurs at the parallel data rate of 1.25, Gb/s the necessary

circuitry can be designed completely in CMOS. This reduces power, and space

consumption, and allows the use of powerful EDA tools for layout and design.

An additional role for 8B/10B encoding is for channel alignment, which guarantees

that the bit 0 of the Tx is connected to bit 0 of the Rx. This requires a 16 bit rotator with a

detection mechanism to rotate the streams until they match.

5.11.2. Transmitter data retiming

A technique that can be used to reduce the output phase jitter of the transmitter is to

clock the output signal directly from the PLL through an MS-latch. This retiming circuit

alleviates all the noise introduced by the multiplexers and provides the minimum signal

path between the transmitter serial output and the PLL.

119

A significant source of jitter on the output data is called deterministic jitter. It is the

result of non-periodic data induced noise. Pull -up resistors at the top of CML trees are a

common source because as current flows through the resistor they heat up; warmer resistors

produce higher rms noise. The ultimate effect is that the noise becomes dependent on the

data stream. A stream with a large number of zeros will have a higher noise component than

one with an equal number of ones and zeros.

The problem with data retiming is that it requires a latch that can operate at the

functional speed of the transmitter. In this case, that speed is 20 GHz, and if some encoding

is introduced then it can be as high as 25 GHz. Simulations show maximum operation of a

latch to be unreliable above 15 GHz. This is a result of the large delay through the two CML

tree gates and the feedback that is inherent in these circuits.

Although direct data retiming is unattainable unless a much faster latch is found,

other improvements can be made. Since the final 4-to-1 (symmetric) multiplexer defines

the output jit ter, an improvement would be to drive the multiplexer directly by the PLL

rather than through the timing delay chain. This adds to design difficultly because the

timing of the entire transmitter is running opposite to the timing of the data. The primary

benefit of this method is the reduction of f ive buffers of phase noise introduced by the delay

chain.

Figure 5-38 Data and clock timingBy moving the PLL to the input of the multiplexer (b), the clock mustrun opposite the data. This creates timing difficulties but decreasing theoutput phase noise of the transmitter.

5.11.3. LC Oscillator

The primary drawback to using the FFI ring oscillator in the transmitter is its very

poor phase noise characteristics. LC oscill ators have much higher quality factors and

PLL

data

clock

transmitter

PLL

data

transmitter

clock

(a) (b)

Current Method Proposed Method

120

considerably less phase noise and jitter [21],[22],[44],[45]. One problem with typical LC

VCOs is that they only produce a single phase clock, but the transmitter architecture in this

research requires a clock and its quadrature. A possible option, and an area for further

research is in multiphase LC oscill ators [46],[47]. They have the best of both worlds: low

phase noise, and quadrature outputs.

121

6Design of the Receiver


The first receiver (Serdes I) was designed for fabrication

in February 1999 and only had a 1-to-4 demultiplexer and clock

extractor. Various improvements and optimizations yielded

Serdes II, which was a more efficient design, capable of full 16 bit demultiplexing and

external data input.

6.2. Receiver Architecture

Figure 6-1 Top level receiver architectureThe receiver is a PLL with a PD, called a transition detector, a PI loopfilter, a VCO, and a demultiplexer to extract the NRZ bits from theserial data.

The receiver is a PLL and demultiplexer that locks an internal VCO to externally

supplied data and extracts the non-return-to-zero (NRZ) bits from the data. Data arrives

serially as a differential signal and is buffered in preparation for driving the PD. The

information collected about transition phases is combined and fed into a proportional and

integral loop fil ter. The filtered signal is used to drive the VCO to a frequency which

matches the frequency of the external data. In addition to collecting timing information, the

Receiver

data

8 phases

VCO

Phase Detector(PD)

loop filter (PI control)

4 16 demultiplexed data

122

PD also performs a 1-4 non-aligned demultiplexing of the data. Another circuit, also driven

by the VCO finishes the demultiplexing and generates 16 bits of parallel data.

6.3. Receiver PLL

The receiver PLL is considered a clock and data recovery

(CDR) circuit and has the primary role of extracting the data bits

from the serial signal and ensuring that the extracted bits are not

corrupted. The process is made more diff icult than in a standard

PLL, because random or pseudo-random data has no guaranteed

transition times. The 3-state and XOR PDs used in the transmitter

PLLs, for example, can only operate with periodic signals. A specialized PD that can

handle non-periodic information and allow a VCO to lock to the fundamental frequency of

the data is required. Merely locking the VCO to the data’s frequency is only half the

problem. The system must also sample, or extract the information contained within the data

stream, using the recovered clock

The receiver designs for Serdes I through III, all util ize a transition detector (TD) PD.

It twice oversamples the data signal and generates a digital measure of the phase difference

between this signal and the clock. It essentially indicated whether the clock is too fast or

too slow relative to the data. With this information, lock can be acquired and because of the

nature of the sampling, data can easily be extracted. The problem with this PD, which was

addressed in the third prototype, is the very small pull -in range of the PLL. Without an

analog measure of phase difference, the clock and data frequencies have to be very close

for the PLL to pull- in.

Fig. 6-2 depicts block diagrams for the three receiver prototypes. The first and second

designs differ in the integrator design, and the VCO. The third integrates an entirely new

loop that is very good at acquiring frequency lock but poor at extracting the data, into the

PLL [14], [30], [51]. Together with the TD PD, the PLL’s pull -in range is greatly increased

without any sacrifice in performance.

Receiver

123

Figure 6-2 Receiver PLL evolutionThe receiver PLL has gone through two major improvements. The firstdesign util ized a FET charge pump which was replaced with a negativeimpedance charge pump in the second design. The third prototype addeda referenced frequency detector which greatly improved the pull -inrange of the loop.

FETchargepump

negative impedencecharge pump

gainblock

transition detector (PD)


gainblock

VCO


dat

ad

ata

3-state PD

dat

a

negative impedencecharge pump

gain block 2

gain block 1

VCO

VCO

VCO

Serdes I

Serdes II

Serdes III

refe

ren

ce

124

6.3.1. Phase Detector

6.3.1.1. Transition Detector (PD)

Data transitions provide the only means to measure the phase of the incoming serial

data. If the data were periodic then we could be assured of a transition at a specific time and

directly compare it with a coincident VCO transition, similar to the clock synthesizer PLL

in the transmitter. However, data by definition, is non-periodic and transition locations

cannot be assured at any time. For example, data containing ten ones followed by twelve

zeros, containing only two transitions, could be received. Since a transition between bits

cannot be guaranteed, there must be no action when no transitions are received and tracking

must be performed when transitions are received.

The aspect of the clock recovery circuit that had criti cal implications on its

development, was the use of the same eight phase ring oscill ator used in the transmitter. It

was felt that by matching the oscill ators in the transmitter and receiver, they could be

ensured to operate at the same speeds and the development of only one VCO would be

required.

Running at 5 GHz, either the CS, or FFI VCO generates eight unique phases (0o, 45o,

90o, 135o,...)1 each separated by 25 ps. Serial data, arriving at 20 Gb/s can be broken up

into bits 50 ps wide. Taking complete advantage of the multi -phase clock, the data is

sampled every clock phase resulting in a twice oversampling receiver scheme. In other

words, for every bit, two samples of the signal will be taken.

Sampling is handled by eight MS-latches whose clock inputs are tied to one of the

eight clock phases (see Fig. 6-3). In the locked and stable condition, four of the latches

sample at the center of the bits and return data information while the other four sample on

the transition and return timing information only. If the latches are labeled consecutively

by their clock phase inputs, W, X, Y, Z and their inverses, then the data latches are W, Y, W,

and Y, while the timing latches are X, Z, X and Z.

1. Although the VCO has only four unique outputs the inverse of each of them yields the remaining fourphases.

125

Figure 6-3 Receiver topologyThe receiver is made up of eight MS-latches, each tied to a unique phaseof the VCO. Since each phase is separated by 25 ps, the data is twiceoversampled, and thus, able to extract transition timing informationfrom all edges. FAST or SLOW in the diagram is a command to theVCO.

Fig. 6-4 shows a detailed look at the transition detector used in Serdes I. Data is

latched with L1 using ΦΦΦΦn, the n-th buffered phase of the VCO. ΦΦΦΦn and ΦΦΦΦn+1 are consecutive

phases of the VCO, separated by 25 ps, or 45o, and ΦΦΦΦn is equal to ΦΦΦΦn+8.The sampled data,

W X Y Z W X Y Z

0 ps

25 ps

50 ps

75 ps

100 ps

W X

200 ps

W X⊗ FAST=

X Y⊗ SLOW=

Y Z⊗ FAST=

Z W⊗ SLOW=

W X⊗ FAST=

X Y⊗ SLOW=

Y Z⊗ FAST=

Z W⊗ SLOW=

W

XY

Z

W

XY

Z

transition location detector

sampling latch

dataA

F

dataB

serial stream

dataD

dataC

serial data

transition detector

phase sli ce

DQ

S

DQ

S

DQ

S

DQ

S

DQ

F

DQ

F

DQ

F

DQ

F

VCO

126

sn, is XORed with the sample from the previous detector, sn-1, and retimed with L2. The

clock input to this latch comes six phases later, or after 150ps, in order to allow the output

of the XOR to settle to the correct value. tn, the output of L2, indicates whether a transition

has occurred during this phase slice. The total time that the tn signal remains high is

dependent on the period of the VCO and whether additional transitions are detected in this

phase slice. With the VCO running at 5 GHz, the minimum time that tn is high is 200 ps.

This circuit is then repeated eight times to collect transition information from every

transition.

Figure 6-4 Transition detector in Serdes IThe first iteration of the transition detector had a latch to sample thedata. This sample and the sample from the previous detector are XORedtogether and latched again to produce the transition detector signal.

The phase plot in Fig. 6-4 shows a transition detector on the X (45o) phase. It uses

samples from itself and from the previous detector to detect transitions within the shaded

region. The XOR of these signals is clocked six phases later.

One of the issues that defines the performance of this circuit is the time between when

the data is sampled and when the detected-transition signal changes. Assuming a 20 ps gate

delay, the approximate time is 170 ps. And since the transition detected signal is high for

200 ps, the effect of a single transition lasts for a total of 370 ps after the sample, which is

equivalent to 7 bits. This is important, because during lock it is desirable to have the

frequency of the VCO adjust as quickly as possible after a transition is detected. The digital

nature of this circuit results in discrete changes to the VCO output, so oscillations are

natural when in lock. If the PD delay is large then these oscil lations will also increase, as

the VCO’s frequency continuously overshoots and undershoots. A further analysis of this

phenomena can be found in Sec. 6.3.2. on pa ge130.

The motivating factor in the design of Serdes II’s TD, shown in Fig. 6-5, was to

reduce the delay through the detector. In the first prototype this time was 170 ps, which

D Q D QΦΦΦΦn’

data

ΦΦΦΦn

ΦΦΦΦn+6sn

sn-1

tn

L1 L2

sn

sn-1

ΦΦΦΦn

ΦΦΦΦn+6

tn

MS-latch

127

directly effected the abili ty of the PLL to maintain and acquire lock. In order to improve on

that design a look at the timing requirements of the XOR was required.

The two level nature of the XOR gate requires the level 2 input to precede the level

1 input by approximately 10 ps. The time between sampled data sn-1 and s is equal to 25

ps, and with the additional 5 ps of delay introduced by the level 2 output of the MS-latch a

total of 30 ps is found between the level 2 input to the XOR gate and the sn-1 signal. When

40 ps of buffer delay is added to the sn-1 signal, a time delta of 10 ps between the inputs of

the XOR gate is realized.

Figure 6-5 Transition detector in Serdes IIOptimization of the transition detector all owed the removal of thesecond MS-latch and reduced the total delay by 75%. This circuit issimplified and requires a less complicated layout.

When the timing is optimized to this extent, the necessity of the second MS-latch, L2,

is removed. The same 200 ps pulse is created, but the total transition detector delay has been

reduced from 170 ps to 40 ps. An additional benefit is in the simplified layout of this circuit;

only one clock phase is required. In the Serdes I circuit, a complex routing scheme was

required because two phases were necessary.

The gain of the transition detector is not clearly defined because of the digital nature

of the circuit. When the phase difference is greater than zero, it will generate a slow pulse

and when less then zero, it will generate a high pulse. There is no linear relationship

between phase and output. Instantaneous gain must therefore be defined to be infinite. The

average gain, however, is not infinite and can be found when a statistical distribution of

transitions or jitter is introduced.

A real data signal does not have perfect transition separation but instead has

transitions separated according to a constant plus a random gaussian variable. This jitter

acts as “ transition fuzz” which effectively gives the PD gain. The process of calculating this

gain is shown in Fig. 6-6 for both a uniform and Gaussian distribution. Fundamentally, it

D QΦΦΦΦn’

datasn

sn-1

tnL1

sn

sn-1

ΦΦΦΦn

tn

12

128

comes down to subtracting the two areas created by split ting the probability density

function (pdf) around zero, after setting a specific mean and standard distribution. For

Gaussian jitter, an approximation; the gain is assumed linear based upon a line that passes

through the point at one standard deviation.

Figure 6-6 Gain of transition detector with data jitterSolving for the gain of the transition detector must take into account thefact that the data has ji tter. This jitter spreads out the transitionsproducing an average PD output.

In order to include the effect of the transition density (tpb = transitions per bit), Kd is

multiplied by tpd. A factor of four must also be included to account for the fact that a

slow/fast pulse is carried across 4 bit widths. This yields the final transiti on detector gain:

In the Serdes I implementation with a pulse size, Vp, of 300 mV, a transition density of 1/4

and an rms ji tter value, σt, of 4 ps, the detector gain equals 811 mV/rad. In the Serdes II

transition detector, the pulse size was reduced to 40 mV yielding a smaller gain of 108

mV/rad.

θθθθe

instantaneous PD output

θθθθ

uniform transition distribution (pdf)STD = σ

average phase error

θθθθ

average PD output

σσσσ

0.58

0.68

Kd' A0.58σ

----------=

Kd' A0.68σ

----------=

σσσσ

gauss ian transition distribution (pdf)STD = σ

A

-A

Kd Vp0.68

σ----------4 tpb( ).= σ σt

2πrad100ps----------------= (6-1)

129

6.3.1.2. NRZ Phase/Frequency Detector (PD/FD) (Hogge)

The digital nature of the transition detector PD and its phase response, yields a very

poor pull -in range. When lock is acquired, however, this PD has very strong noise

immunity, and an inherent abilit y to extract data from the signal. The Hogge PD helps the

poor pull -in range but has no net effect on the TD PD properties. Its use, in conjunction with

the transition detector PD, was evaluated but not implemented for Serdes III .

The schematic of the Hogge PD is shown in Fig. 6-7 [52], [53] which operates on the

NRZ data and generates an analog signal based upon the difference between it and the

VCO. Data, vi, must arrive at half the frequency of the clock, vo, for the PD to operate

correctly. This is accomplished by dividing the input data signal down 4 times. This has the

negative effect of removing every three out of four edges. The two latches and the va XOR

gate retime the data by creating pulses based on data transitions but timed to the clock

transitions. The vb XOR gate, on the other hand, has a similar waveform but the edges are

timed with the data transitions. The dc component of the difference between these two

signals yields a measure of the phase difference.

Figure 6-7 Phase detector for NRZ dataThis circuit shows one technique for detecting phase for NRZ data in aPLL. The bit rate of the data and frequency of the clock must be thesame. The output is taken differentially and yields an continuous analogsignal as a function of phase difference.

D

Q1

D

v i

vo vb

v i

vo

Q2

Q1

vb

Q2

va

va

∆∆∆∆θθθθ

vd

ππππ

−π−π−π−π

vd

∆∆∆∆θθθθ

for 50% transition density

critical delay

130

The most important aspect in implementing this PD was maximizing the figure of

merit. It this case it is defined by the range of pulse widths expressed in vb against the

constant width of va pulses. Ideally, the widths of vb would range from 0 to twice the width

of a va pulse. Finding this solution required a fine adjustment of the criti cal delay, which is

approximately the delay through an MS-latch. By minimizing the integral of the vd versus

∆∆∆∆θ θ θ θ plot over a full 2π radians, the figure of merit can be maximized.

The gain of this PD is a function of the transitions per bit (tpb) for the incoming data

stream. For a 11001100... stream, the tpb is equal to 0.5. From simulation, the gain was

found to be 80 mV/rad/tpd, which includes the divide-by-4 circuit.

Ultimately this PD was not used because it was exceeding difficult to optimize the

delays in the circuit. Slowing down the clock and data was the only way to correct the

problem and as a result the pull -in range suffered. The Serdes III implementation addressed

the small pull -in problem by using an external reference signal.

6.3.2. The Loop Filter

The purpose of the loop filter is to take the digital

transition information from the eight transition detectors and

create an appropriate VCO signal. The transition detectors yield

relative information in regards to data and clock phase offset, so

an integrator is required. An integrator alone is insufficient in the

loop, so a proportional factor is summed with the integrator

output. Together the proportional and integral control comprise the PI loop filter.

Although the loop filter in Fig. 6-8 is expressed as a integral and proportional gain it

can also be expressed by the pole-zero equation

where ω2 is the loop zero and Kh is the high frequency gain.

Unlike the frequency synthesizer in the transmitter, the integrator and proportional

gain components must operate at the frequency of the clock and accept four faster and four

slower signals. This necessitates the use of specialized circuits able to handle the much

Receiver

Kh

s ω2+

s--------------- Kh KP= ω2

KI

KP

------= (6-2)

131

higher frequency. The Serdes III design, although slightly more complicated, stil l contains

the basic components shown in Fig. 6-8.

Figure 6-8 Receiver loop filterThe receiver loop filter accepts eight “digital” signals from thetransition detectors and produces an analog control signal for the VCO.

6.3.2.1. FET Charge Pump / Proportional Control (Serdes I)

The charge pump integrator shown in Fig. 6-9 utili zes four field effect transistor

(FET) pairs to place and remove charge from the capacitor. Each FET can act

independently of the others, so one could be adding charge while another is removing it.

Careful consideration assured that the nFET and pFET sizes were chosen to have matching

currents.

Each FET draws on average 60 µA during one complete period of the clock. With a

300 mV input from the PD this corresponds to a 0.0002 1/Ω gain from the FETs. With Cf

equal to 4 pF, a slow/fast pulse will change the capacitor voltage by ± 3 mV. Dividing the

FET gain by the capacitance yields the integrator gain K I = 50 Mrad/s.

Proportional control, on the other hand, is handled through eight differential

switches, one for each fast and slow PD output, with one branch tied together to form a

single-ended “analog” signal (Fig. 6-10). By default, without any fast or slow signals, all

fast trees will pull 0.75 mA through the pull -up resistor Rcc and all slow trees will pull 0

mA as shown in Fig. 6-10. In this way, the voltage across Rcc will increase when a fast

signal is received and decrease when a slow signal is received. Rcc was set to 100 Ω, which

produces a 75 mV change for each input pulse. The emitter follower tied to Rcc only

introduces a DC offset to interface properly with the summing junction. Designed similarly

to the integrator, the proportional circuit inputs are all able to operate independently.

KI/s

KP

4

4VCOKo

loop filter

slower

fast

erphase detector(s)

8

132

Figure 6-9 MOSFET charge pump integratorThe FET transistors in this circuit act as current switches removing andadding charge to a capacitor. This action integrates the slow and fastinputs.

Figure 6-10 Proportional control and summing junctionThis circuit provides the proportional gain for the loop filter and sumsthe result with the signal from the charge pump integrator. Thisultimately drives the aVref control voltage for the VCO.

For each 300 mV input pulse, the output of the proportional control circuit changes

by 75 mV. This corresponds to a proportional gain, Kp, of 0.25. The summing junction

combines the outputs of the integrator and the proportional gain stage. It introduces an

This MOSFET is designed to bal-ance the current drawn from the base.

S: A slow signal places a charge packet on the capacitor.F: A fast signal removes a charge packet from the capacitor.

Vcc

-2 V

S1 S4

F1 F4 Vint

4 MOSFET pairs

Cf

Rcc

Vint

aVref (VCO)

repeated 4 times for each S/F pair

F1 S1S1F1

summing junction

133

additional gain of 0.286 into the total gain of the loop. Given the gain derived above the

loop filter has a zero, ω2, at 32 MHz and a high frequency gain, Kh, of 71.5 m. Collecting

all the gains from this circuit and multiplying by the pulse period shows a ±0.7 ο phase

change of the VCO for every slow/fast pulse.

6.3.2.2. Negative Impedance Charge Pump (Serdes II)

The goal for the receiver in the Serdes II implementation was to replace the FET

charge pump and proportional control with a much simpler negative impedance charge

pump, while keeping all the PLL parameters the same. There were problems associated

with the FET pump including: poor high frequency response, diff iculty in matching pull -

up and pull -down components, high capacitance discharge, and significant complexity. The

negative impedance pump solved all of these problems with a smaller and simpler circuit.

Using the circuit in Fig. 5-21, equations (5-7)-(5-10), and the loop natural frequency,

zero, and pole of 25 MHz, 6.4 MHz, and 102 MHz, respectively, C1 = 575 pF, C2 = 38 pF,

and R = 43 Ω. A high frequency pole was added to reduce spurious modulation and reduce

the clock jitter and had littl e effect on the overall loop response.

6.3.2.3. Mixed Loop (Serdes III)

The primary design goal of the third Serdes implementation was to improve the poor

pull -in range of the transition detector that was due to its non-linear nature. This resulted in

the serial data frequency being required to be very close to the nominal frequency of the

VCO for pull -in to occur. Given a specific bit-rate this can be very difficult to design across

all thermal, process, and implementation deviations.

An initial approach utili zed a down-counted data signal fed into a separate Hogges

style NRZ PD (Section 6.3.1.2. on page129). The idea was to utilize a second PD that had

a larger pull -in range and could be coupled with the TD PD loop for a better overall pull -

in range. This NRZ PD proved to be difficult to design due to very strict delay requirements

and it did not signif icantly improve the pull- in range.

A second approach used an additional loop which accepts a reference at the (bit

rate)/8 and was designed to respond identically to the loop in the transmitter (Section 5.4.

on page 82). The loop filter output is summed with the transition detector of the original

134

loop to create the VCO’s control voltage as shown in Fig. 6-2 on page123. The purpose of

the new loop is to acquire frequency lock, which pulls the first PLL into lock because of

the common integrator. The second loop is able to acquire solid phase lock once within its

lock-in range and then begin to extract data.

The parameters for the new loop are identical to those previously used. The only

remaining design choices are the gain of the TD PD, and its filter. Choosing an appropriate

gain for the transition detector involves a trade-off in bit error rate and the lock-in range.

At one extreme, a large gain wil l give the PLL a large lock-in range that is approximately

equal to the bandwidth of the loop. For instance, a doubling of the PD gain will result in a

doubling of the lock-in range. This higher gain however, results in a higher bit error rate

(BER) because of the large phase correction. On the other extreme, a small gain will limit

the bandwidth and the lock-in range, but reduce the error rate.

The effect of a large gain on BER results from consecutive transitions that are jittered

in one direction causing an accumulation of phase change. The mean frequency of the data

and of the clock are assumed to be constant, an assumption that reasonable over the few

transitions needed in this analysis.

The BER of single bit errors is given by Q (jitter > 25 ps) which is equal to 3x10-15

for an rms data jitter of 4 ps, and bit width of 50 ps. Q(x) is the integral from x to infinity

of the normalized Gaussian probabilit y density function (pdf). If the BER introduced by the

TD is less than this value, then its effects can, in general, be ignored.

The TD introduces a ∆t ps phase change per transition. The worst case scenario for

an error is when enough phase changes bring the clock phase to 12.5 ps from consecutive

data jitter followed by a ji tter of -12.5 ps in the other direction. In such a case the phase

difference between the clock and the data will be 25 ps. Solving for this is best done by an

example. Assume ∆t equals 5 ps.

Q( jitter > 0 ps ) = 5x10-1 -- make 5 ps phase adjust

Q( jitter > 5 ps ) = 6x10-2 -- jitter must be > then 5ps

Q( jitter > 10ps ) = 9x10 -4 -- ... and so on

Q( jitter > 15ps ) = 1X10 -6

Q( jitter < 10ps ) = 9x10 -4 -- bit error!---------------------------

total probability = 3x10-14

For this example, there were four consecutive “jitters” in the positive direction,

causing a clock phase change of 25 ps. They were followed by a jit ter of 10 ps in the

135

opposite direction. The probability of these individual events are multiplied together to find

the total probability for an error from this chain of events. For the same analysis, but with

∆t equal to 4 ps the result is 7x10-19. In conclusion as long as ∆t is kept below about 4 ps

then the effect of accumulated jitter on phase will be smaller than the chance of a single bit

error, and can be ignored.

Without an integrator in the loop, the VCO control voltage can not exceed the

maximum swing of the TD. Given a 1010 sequence at 20 Gb/s (tpb=1), there would be four

overlapping pulses of magnitude ∆t, which, when multiplied by the VCO gain, yields the

frequency deviation. This defines the lock-in range of the TD loop and is equal to

where ∆v is the magnitude of the voltage pulse from the TD. The factor of 4tpb takes into

account the fact that the TD has no effect on the frequency if there are no transitions. The

more transitions, the larger the potential frequency deviation. Relating a voltage change to

an associated time change yields

Combining the previous two equations to find the lock-in range as a function of ∆t results in

where ωc is the clock frequency.

Typical specifications for a receiver of this type provide for a reference signal which

is within 100 ppm of the frequency of the data. Using a more conservative value of 1000

ppm gives a maximum reference deviation of 20 MHz. Using this value in (6-5) gives a

minimum ∆t of 0.4 ps.

For the final implementation, a value of 0.6 ps was chosen for the phase correction

for every transition. The lock-in range is therefore 30 MHz at a 0.25 transitions per bit. This

relates to a 4 mV pulse which is generated within the TD by combining the eight slow and

fast signals through a common set of pull -up resistors. The resistors were set at 5 Ω with an

0.8 mA current source in each tree.

ωL ∆vKo 4tpb( )= (6-3)

∆v∆t fc

2

Ko

----------------.= (6-4)

ωL 2∆t ωc 24tpb( ).= (6-5)

136

6.3.3. PLL Loop Response

6.3.3.1. Serdes I (FET charge pump)

The total loop gain or bandwidth is found through a product of the VCO gain, K o =

3.14 Grad/s/V; the PD gain, Kd = 811 mV/rad; and the loop filter gain, Kh = 71.5 m and is

equal to 29 MHz. With the loop zero at 32 MHz this yields a damping factor

equal to 0.5 which is underdamped with an overshoot of 30%. For all higher transition rates

the PD gain will increase and increase and improve the damping factor.

Fig. 6-11 depicts the Serdes I PLL locking into a 6.1 Gb/s (tpb = 0.25) data stream.

Using an AHDL program the data was given an rms jitter of 4 ps, which is approximately

the amount produced by the associated transmitter. Up until 5 ns the PLL is pulling-in and

after 10 ns lock-in has occurred. The large deviations around 6.1 GHz are due to the

proportional control mechanism pulsing the frequency to cause a phase correction. During

the phase correction the integrated is forcing the average frequency to equal that of the data.

The non-linear “digital” nature of the PD results in a very limited pull -in range. From

simulation through various initial frequency offsets yields a range of about 2%. The hold-

in range on the other hand is quite large due to the integrator.

6.3.3.2. Serdes II (negative impedance charge pump)

Fundamentally, the Serdes II implementation was very similar to the Serdes I

version. The key parameters, including loop bandwidth, were kept the same though a

slightly different PD, an improved loop filter, and an improved VCO were used. Because

of this, the response is nearly identical to the Serdes I design shown in Fig. 6-11.

ζ 0.5 Kω2

------= (6-6)

137

Figure 6-11 Serdes I loop locking inThis plot shows the Serdes I receiver VCO locking into 6.1 Gb/s, 4 psjitter data. Once frequency lock is established the proportional pulsesoscillate around the target frequency.

6.3.3.3. Serdes III (dual-loop / referenced loop)

The Serdes II I implementation has two loops: one independent loop that dictates the

frequency, and a second dependent loop that phase locks to the incoming data. Fig. 6-12

shows the frequency loop locking in to a reference signal at 750 MHz which is a 6 GHz

clock. Because the same PLL was used in the transmitter of the Serdes III implementation,

the acquisition plots shown in Sect i on5.4.6.3. on page101 show behavior identical to the

operation of this frequency loop.

Also shown in Fig. 6-12, is the phase plot for the phase loop locking in to data with

tpb = 0.25. Lock-in occurs when the clock frequency is about 6.02 GHz, which is within 20

MHz of the clock frequency. It was expected that lock-in would occur when the clock was

within half of 30 MHz or 15 GHz.

The noise seen on the locked-in phase plot is from 4 ps rms jitter added to the data

through an HDL model (Appendix E.5. on page183). This enabled a more accurate and

faster simulation. The choice of jitter is directly related to the jitter produced by the

transmitter, with the assumption that the channel introduces littl e noise.

6.06

6.07

6.08

6.09

6.1

6.116.12

6.13

6.14

6.15

6.16

6.17

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0Time (ns)

Fre

quen

cy (

GH

z)

138

Figure 6-12 Frequency and phase lock-in of Serdes III Rx PLLThe dual loop nature of the Serdes II I Rx PLL allows an independentreferenced loop to frequency lock close to the data frequency. Thesecond loop phase locks when the data and reference frequencies arewithin 0.3% of each other.

6.4. 4-16 Demultiplexing

The transition detector naturally performs 4-16

demultiplexing. It has eight sampling circuits, four of which are

actual data. Each of the data bits are available sequentially and

as such, all four are valid for only one bit time: 50 ps at 20 Gb/s.

This can make timing very dif ficult.

Serdes I was not capable of performing the 4-16

demultiplexing. It could only output the four sampled bits directly off the detector.

The demultiplexer added to Serdes II is shown in Fig. 6-13. It uses four 4-bit MS-

latches each separately clocked by four phase offset clocks. The clocks are generated with

a counter driven by a phase from the PLL. The latches simultaneously sample the 4-bit data

from the transition detector. The transition from the fourth bit, followed by the transition

0

50

100

150

200

250

300

350

0 20 40 60 80 100

Time (ns)

Sam

plin

g P

hase

(de

g)

5.94

5.96

5.98

6.00

6.02

6.04

6.06

6.08

Clo

ck F

requ

ency

(M

Hz)

frequency

phase

Receiver

139

from the first bit, dictates the window that the clock has to sample the data. Delays on the

clock lines had to be carefully balanced and tightly controlled to ensure that the bits were

sampled at the correct time.

Figure 6-13 4-16 demultiplexer architectureThe demultiplexer accepts the set of four bits from the transitiondetector and samples each set into four separate registers. Once 16 bitsare captured those registers are resampled by a 16 bit register to producethe final output.

After all four latches contain a total of 16 bits, another bank of latches resamples all

the bits at once. This register uses the fourth clock, Φ4, plus a small delay. This delay should

be longer than the delay through the first register to capture the 4th bank correctly. The

delay must also be shorter than the time when the 1st bank is sampled. For a 20 Gb/s system,

the clock has a 200 ps window and was placed as close to the center as possible.

6.5. Registers and Decoding

Often a First In First Out (FIFO) system is added to the

output of the demultiplexer. This reduces the timing constraint

on the circuit that reads the 16 bits of parallel data off the chip,

through the use of a separate load clock. A FIFO was not

transition detector

ΦΦΦΦ1

da

db

dc

dd

clock window

dem

ult

iple

xed

dat

a

ΦΦΦΦ2

ΦΦΦΦ1

ΦΦΦΦ2

ΦΦΦΦ3

ΦΦΦΦ4+ττττ

ΦΦΦΦ4

Receiver

140

included in either Serdes I or Serdes II in which the output data is only latched in the 4-16

demultiplexer.

Data decoding is a general term for such techniques as decryption, decompression,

error detection, channel alignment, byte alignment [38], DC voltage balance, simpli fied

clock recovery, frame detection [33], and so on. No encoding was performed in either

Serdes I or Serdes II . See Section 5.11.1. on page 118, for a quick study and

recommendation of the 8B/10B encoding scheme.

6.6. Line Receiver

The line receiver accepts serial data at up to 20 Gb/s. Its

bandwidth must be wide enough, usually 50% higher than the 10

GHz fundamental, to ensure that the data is reproduced

accurately [14], [48], [36], [37], [49].

The Serdes I line receiver consists of a simple single-

ended pad receiver, and is not optimized for bandwidth. The

Serdes II circuit is fully differential and consists of a 6 µm buffer with emitter followers

and 50 Ω termination resistors.

6.7. Test Circuitry

6.7.1. On-chip test pattern generation

Testing the receiver, by itself, at speed is impossible

without a 10 GHz differential signal generator to drive the data

inputs. In order to eliminate reliance on external testing

hardware, the necessary generator was added internally. This

was done in both fabricated Serdes chips by using a 5 GHz VCO

in three different configurations. The first signal was generated by multiplying separate

phases of the VCO to create a 10 GHz bit stream. The second was simply one phase of the

VCO for 5 GHz and the third signal was a phase divided by two for 2.5 GHz. A 4-to-1

Receiver

Receiver

141

multiplexer was added to select between these three generated signals and the forth external

data signal.

6.7.2. True error rate detector (TERD)

The true error rate detection circuit operates between the transmitter and receiver. It

determines bit error rate through an LFSR matched to the transmitter LFSR. Its operation

was discussed in detail in Section5.8.2. on pa ge107.

6.8. Implementation and Fabrication

6.8.1. Serdes I

As stated previously, The power supply in the Serdes I

chips were choose to be -4.5 V. This left plenty of room for the

three levels of logic and the active current sources. Power

minimization was not a design goal so this voltage was not

optimized. Also a -2.0 V supply was required for the bottom of

the CMOS charge pump. Table 6-1 shows the pin-outs of the receiver chip and Fig. 6-14

shows the final layout artwork and the microphotograph of the fabricated part.

The receiver in the Serdes I implementation was limited to testing pads only, so it did

not support the full 4-to-16 demultiplexer. Instead the sampled data from the transition

detector was fed directly to output pads. No additional circuitry was added to retime the

output data, so the four bits were not presented to the output at the same time.

In order to test the high speed operation of the receiver an on-chip data test source

was created. This circuit generated periodic signals at 10 GHz, 5 GHz, and 2.5 GHz. Two

DC pads, R0 and R1, were used to select between the three data source inputs and an

externally supplied input, and R2 was used as a control voltage for the VCO. The receiver

clock was connected to pad R5, and the output data was connected to pads R8 through R11.

To aid in testing, the capacitor from the charge pump was passed to pad R4 through a high

resistance path. This pad could confirm the proper operation of the charge pump while the

circuit was operating.

Receiver

142

Table 6-1 Pin-out of Serdes I transmitter

Pin I/O Description

R0 DC in test source (SELECT A)

R1 DC in test source (SELECT B)

R2 RF out test source output

R3 DC in control voltage for test source

R4 RF out integrator voltage (capacitor)

R5 RF out receiver clock

R6 Power -2 V (FET charge pump)

R7 RF in receiver input

R8 RF out data 3

R9 RF out data 2

R10 RF out data 1

R11 RF out data 0

143

Figure 6-14 Serdes I receiver layout artwork and photographOn the left is the final artwork for the first receiver design. On the rightis a microphotograph of the fabricated part.

6.8.2. Serdes II

The full chip layout and pin-outs are shown and described in Section 5.9.2. on

page 109.

6.9. Testing Results

6.9.1. Serdes I (receiver test results)

The receiver circuit has a pull -in range of 18.7 to 18.9 Gb/s. This represents the range

of frequencies for which the PLL can acquire lock with the onset of new data. Once lock-

in has occurred, the circuit can maintain lock for its hold-in range of 16.4 to 19.6 Gb/s. This

is an undesirable situation for two important reasons. First, the lock-in range dictates the

artwork fabricated chip

test source

S0

S1

S2

S3

S4

S5

S6

S8

S9

S10

S11

S7

charge pump

clock

transition detector

144

allowable range of data frequencies because the communication system can not be expected

to initialize with a lower bit rate and then ramp up to the nominal bit rate. Second, the hold-

in range did not meet the specification of 20 Gb/s.

The cause of the poor pull- in range is the non-linear nature of the transition detector.

It has a very high gain and saturates above a small phase deviation, limiting the ability to

adjust for phase differences. The low hold-in range is due to the lower then expected

frequency range of the current starving VCO, shown in Fig.3-5 on pa ge27.

Fig. 6-15 shows the receiver locked to data at 19.4 Gb/s. (The oscill oscope is

triggered on the input signal) Fig. 6-15(a) shows a locked condition with data arriving with

20 bits per transition (0.05 tpb) and (b) shows a locked condition with 10 bits transition (0.1

tpb).

When the receiver is locked with data at 0.05 tpb (10 one’s 10 zero’s), an rms phase

ji tter of 2.64 ps is measured and shown in Fig. 6-16. When the number of transitions are

decreased to 0.016 tpb (32 1’s 32 0’s) a jitter value of 8 ps is measured. Results indicate

that a locked condition can be maintained for a data stream with an edge every 300 bits

before the clock jitter becomes too large and lock is lost.

Figure 6-15 Serdes I receiver locked to data.The above plots show the recovered clock and the sampled data for adata rate of 19.4 Gb/s. (a) is fed with data with 20 bits per transition and(b) is fed with 10 bits per transition.

recoveredclock

sampled data

(a) (b)

145

Figure 6-16 Serdes I recovered clock showing jitter.This plot shows a receiver locked to data with a 30% duty cycle. Therecovered clock as an rms jitter of 2.6 ps.

6.9.2. Serdes II (receiver test results)

The results from the second receiver iteration were very similar to the first, as

expected. The big difference was that the receiver integrator had a circuit glitch that

prevented it from operating as an integrator. Instead it operated like a low-pass filter. This

limited the hold-in range to that of the pull-in range which was from 4.20 to 4.63 GHz or

16.8 to18.5 Gb/s. Although this small hold-in range is a problem a more serious concern is

the small pull -in range. The only way to solve this problem is to provide the receiver with

a reference signal very close to the frequency of the data. This solution was evaluated and

simulated in Serdes III .

Fig. 6-17 shows the receiver in lock with the data and the clock at 4.5 GHz. This was

achieved by using an external source running at the same frequency as the clock. The

146

internal source operated correctly with various combinations of frequencies. One included

the internal source VCO running at 3.7 GHz with the divide-by-2 enabled and a clock at

4.63 GHz. This corresponds to data with 5 ones and 5 zeros which also indicates that the

receiver is able to lock on both rising and falli ng data transitions.

Figure 6-17 Serdes II Rx locked to dataThe plot captured from the oscilloscope shows input data and thereceiver clock locked to it. Both are at 4.5 GHz, and the data representsa bit pattern of 1100 at 18 Gb/s.

One way to measure the performance of the receiver is to look at the phase noise of

the recovered clock relative to the transition density [14], [31]. Fig. 6-18 shows four

different phase noise measurements for varying lengths of periodic data streams. The data

was generated with the HP 8563 low phase noise signal source.

The curve for 100 bits represents a series of 50 one’s followed by 50 zero’s. As can

be seen in the plot, the fewer the transitions the higher the phase noise. At 1 MHz, a

transition density of 0.052 yields a phase noise value of -112 dBc/Hz and a density of

0.0064 yields a value of -88 dBc/Hz. As the clock phase noise increases so does the jitter,

clock

data

147

which relates to a larger BER. In the minimum, and likely, worst case of 19 bits, integrating

from 1 MHz to 1 GHz to find the phase noise gives an rms jitter of approximately 2.0 ps.

Figure 6-18 Serdes II receiver clock phase noiseThis plot shows the phase noise for various length bit sequences. Thesequence consists of a string of one’ s followed by a string of zero’s witha period indicated in the plot. As expected, the fewer transitions thelarger the phase noise.

The final test of the receiver involved connecting the output of the transmitter back

into the receiver. This util ized the full potential of the built -in testing circuitry. The first

problem encountered was the inabil ity to feed back a differential signal. This was because

two matched lines from the output of the Tx to the input of the Rx could not be guaranteed.

The probes, connectors, and cables introduce too much variation in length to work properly.

Even a few millim eters could offset the differential signals by a considerable amount. It

was concluded that for differential testing, the part would have to be packaged and placed

on a board.

Because differential testing was out of the question, the system was set up for single-

ended testing. This was done by tying one end of the receiver input to a DC reference

voltage half-way between the high and low transmitter signal levels. This technique

destroyed the benefits of a differential signal and would not operate at either 20 or 10 Gb/s.

-130

-120

-110

-100

-90

-80

-70

0.1 1 10 100

Frequ ency (MHz)

Pha

se N

oise

(dB

c/H

z) 156 bi ts

100 bi ts

76 bi ts

19 bi ts

148

The feed-through pad showed a highly corrupted signal. The single-ended technique and/or

a bandwidth problem in the differential pad receiver prevented a full-test of the feedback

testing scheme.

6.10.Future Work

6.10.1. Sampling offset correction

One attribute of data arriving in a receiver, typically seen in optical systems, is bits

that are skewed toward one transition. This is usually an effect of the non-linear nature of

the light sensitive diode, but can be a result of the transmitter or from the channel i tself. The

ramification is an increase in BER if samples are taken at the exact center of the bit. The

solution is to allow the offset of the data sampling points relative to the data transitions.

6.10.2. 40 Gb/s?

The first step in moving to a 40 Gb/s solution is to utili ze a 10 GHz ring oscill ator.

Given this possibili ty, the next problem is in the design of the receiver ampli fier. This

ampli fier will require at least a 20 GHz bandwidth and must be able to drive a significant

number of loads. It may be necessary to sacrifice phase detection of every transition and

just utili ze every fourth edge to reduce the MS-latch loading effects. This solution still

requires four data latches, plus one transition latch which may still be too high. Another

solution would be to use a bang-bang phase detector that requires a clock and its quadrature

at half the baud rate [26], [32]. This solution requires only four MS-latches.

6.10.3. Demultiplexer improvements

A problem found during the testing of the Serdes II chip was in the 4-to-16

demultiplexer described in Section 6.4. on page 138. Due to stringent timing constraints

and excessive loading, the set of 4 four bit latches were failing to latch the data. Fig. 6-19

depicts an improved demultiplexer that operates in stages. The first stage latches the four

data bits from one of the PLL clock phases. The clock is then divided by two and used to

clock the next stages of eight latches. The clock is then divided again and the data is latched

149

into 16 latches. The final stage realigns all the data edges by latching the 16 bits

simultanously.

Figure 6-19 Revised 4-to-16 demultiplexerIn order to reduce the timing requirements on the demultiplexer the datais demultiplexed in stages. Each stage is successively clocked by a clockof half the frequency from the previous stage.

tran

siti

on

d

etec

tor

ΦΦΦΦ1

da

db

dc

dd

dem

ult

iple

xed

dat

a

200 ps

ΦΦΦΦ1

toggleF/F

toggleF/F x2

2

150

Discussion & Conclusion

In conclusion, three 20 Gb/s communication systems were designed and two were

fabricated in IBM’s SiGe 5 HP process. Each design built on test results from the previous

implementations, and the third, and final design was intended for future research and

development.

The second iteration was a unified transceiver chip possessing a transmitter and a

receiver. It had wirebond pads for wafer probe testing as well as C4 pads for flip-chip

packaging. Through the C4 pads, 16 bits of parallel data could be supplied to and extracted

from the chip. An internal testing circuit enabled complete testing of the chip without the

need for packaging.

The Feed Forward Interpolated VCO, a four stage ring oscill ator that uses novel feed

forwarding techniques, was developed. Its very high frequency nature required the use of

capacitance to slow its frequency down to 5 GHz. Its flexibility makes it an excellent choice

for short-haul communication systems. Phase noise at 1 MHz was measured as -90.5

dBc/Hz which is one of the best numbers quoted for a ring oscill ator at this speed. The

associated jitter is quite small and is an interesting function of the control voltage.

The transmitter in the second prototype had a very wide operating range of 14.27 to

21.58 Gb/s. A time domain sampling oscil loscope measured an rms clock jitter value of 4.3

ps or 0.086 UI. Using a spectrum analyzer, however, rms clock jitter from 100 kHz to 100

MHz was measured at 1.4 ps. The eye diagram was very symmetric, indicating that the

symmetric multiplexer and data interleaving scheme operated as expected.

The second receiver did not have an external reference and, therefore, had only the

high speed data stream to lock to. This limited the pull- in range to 16.8 to 18.5 Gb/s. Clock

ji tter measured from the oscill oscope had an rms value of 2.0 ps. At very low transition

rates of 78 bits per transition, the receiver was still able to maintain lock. This is credited

to the phase detector which is able to use every transition for phase information.

151

A third prototype was developed, but not fabricated, using the data acquired from the

first two designs. The transmitter PLL bandwidth was further optimized and a negative

impedance amplifier loop filter was added. A frequency locked loop was added to the

receiver PLL to greatly enhance the pull -in range. The demultiplexer scheme was also

improved to minimize the timing constraints.

152

References

[1] R. C. Walker, K. Hsieh, T. A. Knotts, and C. Yen, “A 10 Gb/s Si-Bipolar TX/RXChipset for Computer Data Transmission,” IEEE International Solid-State CircuitsConference, pp. 302-303, 1998.

[2] S. A. Steidl, “A 32-Word by 32-Bit Three-Port Bipolar Register File ImplementedUsing a SiGe HBT BiCMOS Technology,” Candidacy document, Rensselaer Poly-technic Institute, Department of Electrical Engineering, May 1999.

[3] P. M. Cambell , H. J. Greub, A. Garg, S.l A. Steidl, S. Carlough, M. Ernest, R. Phil -hower, C. Maier, R. P. Kraft, and J. F. McDonald, “A Very-Wide-Bandwidth DigitalVCO Using Quadrature Frequency Multiplication and Division Implemented inAlGaAs/GaAs HBTs,” Proc. GaAs IC Symp., pp. 311-314, 1995.

[4] A. W. Buchwald, and K. W. Martin, “High-speed voltage-controlled oscill ator withquadrature outputs,” Electronics Letters, vol. 27, no. 4, pp. 309-310, February 1991.

[5] R. Walker, C. Stout, C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data RecoveryIC with Robust Loss of Signal Detection,” IEEE International Solid-State CircuitsConference, pp. 246-247, 1997.

[6] M. Ernest, T. W. Krawczyk, and J. F. McDonald, “Symmetric Multiplexer,” Inven-tion Disclosure Record, Rensselaer Polytechnic Institute, February 2000.

[7] T. W. Krawczyk, and J. F. McDonald, “The Feed Forward Voltage Controlled RingOscill ator,” Invention Disclosure Record, Rensselaer Polytechnic Institute, May2000.

[8] D. C. Ahlgren, G. Freeman, S. Subbanna, R. Groves, D. Greenberg, J. Malinowski,D. Nguyen-Ngoc, S. J. Jeng, K. Stein, K. Schonenberg, D. Kiesling, B. Martin, S.Wu, D. L. Harame, and B. Meyerson, “A SiGe HBT BiCMOS technology for mixedsignal RF applications,” Proceedings of the IEEE Bipolar/BiCMOS Circuits andTechnology Meeting, Minneapolis, MN, pp. 195-197, September 1997.

[9] K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, “95 GHz fTSelf-Aligned Selective Epitaxial SiGe HBT with SMI Electrodes,” IEEE Interna-tional Solid-State Circuits Conference, pp. 312-313, 1998.

[10] L. Larson, M. Case, S. Rosenbaum, D. Rensch, P. MacDonald, M. Matloubian, M.Chen, D. Harame, J. Malinowski, B. Meyerson, M. Gilbert, and S. Mass, “Si/SiGeHBT Technology for Low-Cost Monolithic Microwave Integrated Circuits,” IEEEInternational Solid-State Circuits Conference, pp. 80-81, 1996.

[11] J. R. Long, M. A. Copealand, S. J. Kovacic, D. S. Malhi, and D. L. Harame, “RFAnalog and Digital Circuits in SiGe Technology,” IEEE International Solid-StateCircuits Conference, pp. 82-83, 1996.

[12] K. Ismail , “Si/SiGe CMOS: Can it extend the li fetime of Si,” IEEE InternationalSolid-State Circuits Conference, pp. 116-117, 1997.

153

[13] L. Sun, T. Kwasniewski, and K. Iniewski, “A Quadrature Output Controlled RingOscill ator Based on Three-Stage sub-feedback Loops,” IEEE Internation Sympo-sium on Circuits and Systems, vol. 2, pp 176-179, 1999.

[14] R. Walker, C. Stout, and C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recov-ery IC with Robust Loss of Signal Detection,” IEEE International Solid-State Cir-cuits Conference, pp. 246-247, 1997.

[15] L. Dai, and R. Harjani, “Comparisons and Analysis of Phase Noise in Ring Oscilla-tors,” IEEE International Symposium on Circuits and Systems, pp. 77-80, May2000.

[16] A. Hajimi ri, and Thomas H. Lee, “A General Theory of Phase Noise in ElectricalOscill ators,” IEEE Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179-194, Feb-ruary 1998.

[17] J. A. McNeil, “Jitter in Ring Oscil lators,” IEEE Journal of Solid-State Circuits, vol.32, pp. 870-879, June 1997.

[18] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring Oscill a-tors,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June 1999.

[19] T. H. Lee, and A. Hajimi ri, “Oscill ator Phase Noise: A Tutorial,” IEEE Journal ofSolid-State Circuits, vol. 35, no. 3, pp. 326-335, March 2000.

[20] H. Matsuoka, and T. Tsukahara, “A 5-GHz Frequency-Doubling Quadrature Modu-lator with a Ring-Type Local Oscill ator,” IEEE Journal of Solid-State Circuits, vol.34, pp. 1345-1348, September 1999.

[21] J. Plouchart, H. Ainspan, M. Soyuer, and A. Ruehli , “A Fully-Monolithic SiGe Dif-ferential Voltage-Controlled Oscillator for 5 GHz Wireless Applications,” IEEERadio Frequency Integrated Circuits Symposium, pp. 57-60, 2000.

[22] M. Soyuer, J. N. Joachim, N. Burghartz, H. A. Ainspan, K. A. Jenkins, P. Xiao, A.R. Shahani, M. S. Dolan, and D. L. Harame, “An 11-GHz 3-V SiGe Voltage Con-trolled Oscill ator with Integrated Resonantor,” IEEE Journal of Solid-State Circuits,vol. 32, no. 9, pp. 1451-1454, September 1997.

[23] S. K. Enam and A. A. Abidi, “A 300-MHz Voltage-Controlled Ring Oscil lator,”IEEE Journal of Solid-State Circuits, vol. 25, no. 1, pp. 312-315, February 1990.

[24] S. Lee, B. Kim, and K. Lee, “A Novel High-Speed Ring Oscill ator for MultiphaseClock Generation Using Negative Skewed Delay Scheme,” IEEE Journal of Solid-State Circuits, vol. 32, no. 2, pp. 1451-1454, February 1997.

[25] D. C. Ahlgren, M. Gilbert, D. Greenberg, S. J. Jeng, J. Malinowskil, D. Nguyen-Ngoc, K. Schonenberg, K. Stein, R. Groves, K. Walter, G. Hueckel, D. Colavito, G.Freeman, D. Suderland, D. L. Harame, and B. Meyerson, “Manufacturability dem-onstration of an integrated SiGe HBT technology for the analog and wireless marketplace,“ IEEE International Electron Devices Meeting Technical Digest, San Fran-cisco, CA, December 1996, pp. 859-862.

[26] J. F. Ewan, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker, and H. A. Ainspan,“Single-Chip 1062 Mbaud CMOS Transceiver for Serial Data Communications,”IEEE International Solid-State Circuits Conference, pp. 32-33, 1995.

[27] D. Friedman, M. Meghelli , B. Parker, H. Ainspan, and M. Soyuer, “Sub-picosecondSiGe BiCMOS Transmit and Receive PLLs for 12.5 Gbaud Serial Data Communi-cation,” Symposium on VLSI Circuits, pp. 132-135, 2000.

154

[28] R. Farjad-Rad, C. Yang, M. Horowitz, and T. Lee, “A 0.3-mm CMOS 8-Gb/s 4-PAM Serial Link Transceiver,” IEEE Journal of Solid-State Circuits, vol. 35, no. 5,pp. 757-764, May 2000.

[29] H. Knapp, T. F. Mefster, M. Wurzer, D. Zoschg, K. Aufinger, and L. Treitinger, “A79 GHz Dynamic Frequency Divider in SiGe Bipolar Technology,” IEEE Interna-tional Solid-State Circuits Conference, pp. 208-209, 2000.

[30] M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “SiGe BiCMOS 3.3V Clockand Data Recovery Circuits for 10Gb/s Serial Transmission Systems,” IEEE Inter-national Solid-State Circuits Conference, pp. 56-57, 2000.

[31] Y. M. Greshishchev, and P. Schvan, “SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application,” IEEE Journal of Solid-State Circuits,vol. 35, no. 9, pp. 1353-1359, September 2000.

[32] A. Pottbacker, U. Langmann, and H. Schreiber, “A Si Bipolar Phase and FrequencyDetector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-State Cir-cuits, vol. 27, no. 12, pp. 1747-1751, December 1992.

[33] S. Shioiri, M. Soda, T. Monikawa, T. Hashimoto, F. Sato, and K. Emura, “A 10 Gb/sSiGe Framer/Demultiplexer fo SDH Systems,” IEEE International Solid-State Cir-cuits Conference, pp. 202-203, 1998.

[34] Albert X. Widmer, “Method of Coding to Minimize Delay at a CommunicationNode,” U.S. Patent 4665517, assigned to Internation Business Machines, 1987.

[35] M. Fukaishi, S. Nakamura, A. Tajima, Y. Kinoshita, Y. Suemura, H. Suzuki, T. Itani,H. Miyamoto, N. Henmi, T. Yamazaki, and M. Yotsuyanagi, “A 2.125-Gb/s BiC-MOS Fiber Channel Transmitter for Serial Data Communications,” IEEE Journal ofSolid-State Circuits, vol. 34, no. 9, pp. 1325-1330, September 1999.

[36] Y. M. Greshishchev, and P. Schvan, “A 60-dB Gain, 55-dB Dynamic Range, 10-Gb/s Broad-Band SiGe HBT Limiting Ampli fier,” IEEE Journal of Solid-State Cir-cuits, vol. 34, no. 12, pp. 1914-1920, December 1999.

[37] W. Pöhlmann, “A Sil icon-Bipolar Ampli fier for 10 Gbit/s with 45 dB Gain,” IEEEJournal of Solid-State Circuits, vol. 29, no. 5, pp. 551-556, May 1994.

[38] K. Kawai, and H. Ichino, “A 0.6 W 10 Gb/s SONET/SDH Bit-Error-MonitoringLSI,” IEEE International Solid-State Circuits Conference, pp. 54-55, 2000.

[39] S. Finocchiaro, G. Palmisano, R. Salerno, and C. Sclafani, “Design of Bipolar RingOscill ators,” IEEE International Symposium on Circuits and Systems, vol. 1, pp 5-8,1999.

[40] Y. Chen, S. Koneru, E. Lee, and R. Geiger, “Simulation of Random Jitter in RingOscill ators with SPICE,” IEEE International Symposium on Circuits and Systems,vol. 2, pp 1154-1157, 1997.

[41] Dan H. Wolaver, Phase-Locked Loop Circuit Design., Englewood Cli ffs, NJ: Pren-tice Hall, 1991.

[42] T. Kuroda, T. Fuji ta, Y. Itabashi, S. Kabumoto, M. Noda, and A. Kanuma, “1.65Gb/s 60 mW 4:1 Multiplexer and 1.8 Gb/s 80 mW 1:4 Demultiplexer ICs Using 2V3-Level Series-Gated ECL Circuits,” IEEE International Solid-State Circuits Con-ference, pp. 36-37, 1995.

[43] D. Chen, R. Waldron, “A Single-Chip 266 Mb/s CMOS Transmitter/Receiver forSerial Data Communications,” IEEE International Solid-State Circuits Conference,pp. 100-101, 1993.

155

[44] M. Soyuer, K. A. Jenkins, J. N. Burghartz, H. A. Ainspan, F. J. Canora, S. Ponna-palli , J. F. Ewen, and W. E. Pence, “A 2.4 GHz Sil icon Bipolar Oscill ator with Inte-grated Resonator,” IEEE Journal of Solid-State Circuits, vol. 31, no. 2, pp. 268-270,February 1996.

[45] F. Svelto, S Deantoni, and R. Castello, “A 1.3 GHz Low-Phase Noise Fully TunableCMOS LC VCO,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 356-361,March 2000.

[46] J. J. Kim, and B. Kim, “A Low-Phase-Noise CMOS LC Oscill ator with a RingStructure,” IEEE International Solid-State Circuits Conference, pp. 430-431, 2000.

[47] C. Wu, and H. Kao, “A 1.8 GHz CMOS Quadrature Voltage-Controlled Oscill ator(VCO) Using the Constant-Current LC Ring Oscill ator Structure,” IEEE Interna-tional Symposium on Circuits and Systems, vol. 4, pp 378-381, 1998.

[48] J. Akagi, Y. Kuriyama, M. Asaka, T. Sugiyama, N. Lizuka, K. Tsuda, and M. Obara,“Five AlGaAs/GaAs HBT ICs for a 20 Gb/s Optical Receiver,” IEEE InternationalSolid-State Circuits Conference, pp. 168-169, 1994.

[49] M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzaki, andT. Tashiro, “Si-Analog ICs for 20 Gb/s Optical Receiver,” IEEE International Solid-State Circuits Conference, pp. 170-171, 1994.

[50] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, “A 900 MHz CMOS LC-Oscill ator with Quadrature Outputs,” IEEE International Solid-State Circuits Con-ference, pp. 392-393, 1996.

[51] B. L. Thompson, and H. Lee, “A BiCMOS Receiver/Transmit PLL Pair for SerialData Communications,” IEEE Custom Integrated Circuits Conference, pp. 29.6.1-29.6.5, May 1992.

[52] C. R. Hogge, “A Self Correcting Clock Recovery Circuit,” IEEE Journal of Light-wave Technology, vol. LT-3, no. 6, pp. 1312-1314, December 1985.

[53] D. Y. Wu, A. C. Yen, D. Meeker, S. Beccue, K. Pedrotti, J. Penney, A. Price, and K.C. Wang, “Two Phase Detectors for 2.5-10 Gb/s NRZ Data Operation: a Hogge anda Balanced Mixer,” GaAs IC Symp., pp. 266-269, 1996.

156

Appendix A. IBM SiGe 5 HP

A.1. NPN Vbe characteristics

The SiGe npn transistor Vbe characteristics are important for various reasons. First it

indicates the turn-on voltage of the transistor: the voltage below which the transistor is

considered off. Second, at a given operating collector current it can be used to find the

base-emitter voltage. Third, and perhaps most importantly, is that the derivative of the

transistor’s Vbe with respect to the collector current, Ic, is the transconductance. This

parameter is found in Fig. A-2 by taking the slope at half the peak f T current. This current

flows through an optimized differential pair when both inputs are biased identically.

Figure A-1 Ic-Vbe characteristics for npn transistorThe above plot shows the collector current at a fixed V ce of 2 V versusVbe. The analytical approximation is accurate up to the operating pointof 0.7 mA/µm.

-20-18-16-14-12-10-8-6-4-20246

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Vbe (V)

Nor

mal

ized

Ic (

ln(m

A/u

m))

Simulated

Analytical

157

Figure A-2 NPN transconductanceThe transconductance is the point where the collector current is half themaximum fT current.

Comparing the simulated transconductance to that found in

yields a a fudge factor, γ, of 1.65.

The simulated plot in Fig. A-1 is found from

where Is is graphically determined to be 30 fA.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.87 0.88 0.89 0.9 0.91 0.92 0.93

Vbe (V)

Nor

mal

ized

Ic (

mA

/m

)

120 ΩΩΩΩ//// µµµµm8.33m/ΩΩΩΩ//// µµµµm

gm1γ---

ie

vT

-----= re γvT

ie

-----= (A-1)

Ic Ise

Vbe

VT

-------

= (A-2)

158

A.2. NPN Ic versus Vce characteristics

Figure A-3 Ic-Vce characteristics for npn transistorThe above plot shows the collector current response versuscollector-emitter voltage for different base currents. Breakdown occursat a Vce of 3 V.

The Ic versus Vce characteristics of the npn transistor reveal important design

parameters. The first is a breakdown voltage of 3 V which is the maximum voltage that can

be applied across the collector-emitter junction. Above this voltage the base current loses

control over the collector current and large amounts of current begin to flow. The Early

voltage, the voltage at which all backwards linear extrapolations of the curves meet, is

about 45 V. This parameter is related to the output resistance looking into the collector by

where Ic is the collector current near the active region. The normalized value of ro is 80

kΩ-µm.

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

0 1 2 3 4 5

Col lector-Emitter Voltage (V)

Col

lect

or C

urre

nt (

mA

/m

)

0 µµµµA/µµµµm






ro

VA

Ic

------= (A-3)

159

A.3. NPN fT Curves

Figure A-4 fT vs Ic characteristics for npn transistorThe maximum transition frequency for the SiGe npn transistors occursat approximately 0.8 mA/µm. Above that current the fT drops offrapidly and that range should be avoided during design.

The most important design parameter found in the fT curves in Fig. A-4 is the DC

collector current bias point for maximum operating frequency. Although this normalized

current increases slightly as larger transistors are used, a value of 0.8 mA/µm is reasonable

for all sizes. Also worth noting in this plot, is the fact that as larger transistors are used, and

thus more power is supplied, the faster the transistors operate. The smallest transistor has a

peak fT of approximately 50 GHz and the largest transistor peaks at 62 GHz.

0

10

20

30

40

50

60

70

0.001 0.01 0.1 1 10

Normalized Collector Current (mA/µµµµ m)

Fre

quen

cy (

GH

z)

1 um

2.5 um

5 um

10 um

20 um

160

Appendix B. CML Logic Gates

B.1. CML Voltage Swing (non-linearized, digital)

The CML voltage swing is found by analyzing the collector current flow through

each of the two transistors in a differential pair with a DC differential voltage on the inputs.

The voltage swing must be large enough to ensure that the majority of current flows

through only one transistor. Fig. B-1 depicts how the current flow shifts from one transistor

to the other as the differential voltage changes. At about ±200 mV, at least 99% of the

current is flowing through one leg of the CML buffer. This is the assigned minimum

operating voltage swing and a more conservative 250 mV or greater was used throughout

this project.

Figure B-1 Current switching versus differential input voltageThe input to a differential pair controls the switching of current throughtwo branches. A critical current level must be reached to assure that thedigital gate has completely switched. For a 99.7% current level throughone branch, a minimum of 250 mV must be applied.

B.2. CML Signals

CML circuits posess important attributes called signal levels, which are necessary to

connect multiple gates together. The need to merge multiple differential pairs arises from

0.001%

0.010%

0.100%

1.000%

10.000%

100.000%

-300 -250 -200 -150 -100 -50 0

Differential Voltage (mV)

Per

cent

age

of to

tal c

urre

nt lo

g(%

)

161

the small , but desirable voltage swing (Appendix B.1.), the large base to emitter voltage

(Appendix A.1.), and the technique used. Merging pairs together involves stacking them so

that current through one is a function of the state of another. In this way, different current

paths can be connected to the pull -up resistors, the output. Other techniques exist for

combining differential pairs, see Section5.3.1. on pa ge75, but they are not by themselves

considered CML.

Figure B-2 Simple AND CML GateThis gate shows how multiple differential pairs can be merged toproduce a two level gate.

In Fig. B-2, the differential input a must be of higher potential, specifically one Vbe

higher, then input b, to ensure that transistor Q1 will not become saturated. Input a is said

to be on level 1 (0 mV, -250 mV) and b is said to be on level 2 (-900 mV, -1150 mV). A

supply voltage as low at -3.2 V allows up to three levels of inputs.

Level 1 outputs, x, are found at the bottom of the pull -up or collector resistors at the

top of the tree. Level 2, y, and 3, z, outputs are generated from emitter followers and a

diode.

The size of pull -up resistors r1, and r2 is based upon the current source, to produce a

nominal voltage swing of at least 250 mV. For 1 µm sized transistors biased at a current of

0.8 mA, the resistors are set to 400 Ω. In general the normalized resistor value is 400 Ω-µm.

B.3. Voltage Reference

All CML gates require a current source to fix the current flow through the differential

pair switch. The simplest approach, a passive source, places a resistor at the bottom of the

a0 a1

b1b0 Q1

y0

z0

y1

z1

x0

x1

r1 r2

162

tree which has a nearly constant voltage across it and is dependent only on the lowest

transistor pair. This technique has high common mode gain on the lowest differential pair

and often requires a large resistor.

Figure B-3 Reference Voltage GeneratorActive current sources configured in a current mirror require a referencevoltage to control the amount.

A more common approach is to use an active current source implemented as a current

mirror. Fig. B-3 shows the generating circuit producing a mirror current of 0.75 mA/µm.

This current was chosen based upon the current necessary to achieve the maximum

operating frequency of the transistors. See Appendi xA.2.

The emitter degeneracy resistor typically has 0.4 V across it and is used to control

currents which are smaller or larger than the mirror current. For instance, if a 4 µm

transistor circuit requires 3.0 mA, then a 100 Ω emitter resistor will be used.

Transistor Q2 is used for base current compensation and supplies the base current to

all connected circuits. It allows a larger number of sources to be used and prevents current

degradation when adding sources.

The value of R1 is dependent on the supply voltage of the circuits. Designs with

different supplies need only change this resistor to ensure a fixed current throughout all .

B.4. Buffer with emitter follower outputs

A buffer accepts a single input and duplicates it on its output. Its many uses include:

impedance conversion (high input impedance and low output impedance), fixed delay

introduction, and level shifting. Buffers also form the foundation for more complicated

circuits.

1.5

mA

Vee

Vcc

Vref

2x

2x

200Ω ΩΩΩ

R1

Vee-4.5 V-3.2 V

R11.73 kΩΩΩΩ0.87 kΩΩΩΩ

Q2

Q11x

Vee

400Ω ΩΩΩ

0.75

mA

163

The circuit in Fig. B-4 can accept input, a, on levels 1, 2, or 3, since it has only one

differential pair. Level 1 output, x, is taken from the bottom of the pull -up resistors, and

level 2 output, y, is taken from the output of the emitter follower.

Figure B-4 CML Buffer with emitter followersA basic buffer with level 1 and level 2 outputs. It can accept input andany level.

The emitter follower output provides a much higher driving abil ity than the level 1

output. This is because the driving current from the level 1 output is passively pulled-up

through the resistors, and actively pulled-down through the differential pair. As more loads

are added, the base current from each must be supplied through the passive resistors, which

causes a voltage drop and limits the voltage swing. The passive pull -up through the

resistors also limits the speed of the gate. The emitter followers, on the other hand, provide

a high impedance output through β ampli fication of current through transistors, Q1, and Q2.

In this case, the output is actively pulled-up through the follower transistor and actively

pulled,down through the current source.

a0 a1 y0

x0x1

y1

Vee

Vcc

Q1 Q2

164

Appendix C. CML Circuit Details

C.1. Linearizing the differential amplifier

The differential ampli fier is very effective in digital circuits because of its high

voltage gain. For analog circuits, where a linear response is needed, this gain must be

reduced to meet specifications. The preferred method for doing so is to include emitter

resistors to augment the emitter resistance, re, already present in the transistor.

Figure C-1 Linearizing the differential amplifier with emitter resistorsThe addition of emitter resistors augments the output resistance of thedifferential pair transistors and decreases the total gain of the circuit.

The emitter resistance is defined as the resistance from the base to the emitter looking

into the emitter, and it is the inverse of the transconductance, gm. The normalized value

found through simulation in Appendix A.1. is about 120 Ω-µm. The inverse of the sum of

this value and the emitter resistor Re yields the gain

of the circuit with output current and input voltage. In order to find the total voltage gain

Ad must be multiplied by the collector resistance Rc.

A plot of currents, i0 and i1, versus differential input voltage, a0, and a1 is shown in

Fig. C-2. The plot with 0 Ω-µm represents the nominal transfer function for a digital gate.

The gain is high and an input voltage of 100 mV ensures a nearly complete switch of

current. For digital circuits, this allows for a high noise margin, and fast switching

i1 i0

ReRe

a0 a1

Rc Rc

Ad1

re Re+-----------------≈ re

VT

Ie

------ 1gm

------= = (C-1)

165

characteristics. For analog circuits, on the other hand, the active, linear region of the curve

is very small: ±50 mV. It is clear that the addition of the emitter resistors is crucial in

reducing the gain and spreading out the linear region. The choice of resistor wil l be

determined by the output range needed and the gain at an input of 0 V.

Figure C-2 Branch current response for various emitter resistorsThis plot shows the transfer of current from one branch to the otherwhen the differential inputs are changed. Each pair of curves has a fixedemitter resistor

A comparison between (C-1) and the simulated results is plotted in Fig. C-3 and

shows a very good match.

Figure C-3 Simulated / Analytical Gain(C-1) follows the simulated results for the transconductance of a CMLbuffer with emitter resistors shown here.

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

-0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40

Differential Voltage (V)

Bra

nch

Cur

rent

i0

,i1

(mA

/m

)

0ΩΩΩΩ−−−−µµµµm 200ΩΩΩΩ−−−−µµµµ m 400ΩΩΩΩ−−−−µµµµm

600ΩΩΩΩ−−−−µµµµ m800ΩΩΩΩ−−−−µµµµm

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 200 400 600 800

Normalized Re (ΩΩΩΩ -µµµµm)

Inve

rse

Gai

n (V

/mA

-m

)

2.5

1.25

1.66

1

5 Gai

n (m

A/V

/m

)

166

C.2. Current bypassing

In some situations it may be necessary to limit the extent of current switching in a

differential ampli fier. For example, the FFI VCO requires a minimum current flow through

both branches, no matter the input. The solution is to include a bypass resistor which

ensures that some constant current flows in addition to the current defined by the

differential transistor pair.

Figure C-4 Limiting full current switching with bypass resistorsThe addition of bypass resistors allows some current to always flowaround the differential pair. This prevents a complete switching ofcurrent.

Two behavoirs result with the addition of the bypass resistor. First, a full switch of

current through the tree is prevented, which is a desired result. Second, there is a relative

decrease in the gain of the circuit, because of the decrease in collector current which

negatively affects the transconductance. Each of these effects is modeled in this section and

compared to simulation results. In addition, two equations which can be used as design

tools when specifications on gain, and current range are provided.

The maximum current in a branch is a function of the total current, the bypass and

emitter resistors, and the input voltage. Starting with the assumption that branch 1 has zero

emitter current, i.e. a0 is much higher then a1. The currents through each bypass resistor are

the same. It is assumed that there is a differential pair above this one with emitter voltages

i0

a0 a1Rb

Re Re

i1

Rb

167

at the same potential. We define equations

where Io is the total current through the tree, vd is the differential input voltage and vo is the

voltage across the bypass resistor. The value for vbe is found in Fig. A-2 on page157.

Solving for the current through branch 0 yields

Fig. C-5 shows the analytical and simulated results for the maximum current as a fraction

of the total current for emitter resistors of value 0 Ω−µm and 400 Ω−µm, and a differential

input of 400 mV.

With large bypass resistor values, the circuit allows almost a full current switch

because less current is bypassed around the differential pair. Values below about 10 kΩ-µm

produce a much larger reduction down to about 3 kΩ-µm when Rb is too small and no

current switching takes place.

Io ie1 2ib+=

ie1

vo

vd

2----- vbe–+

Re------------------------------=

ib

vo

Rb------=

(C-2)

(C-3)

(C-4)

Imax Io ib–Io Re Rb+( ) vbe

vd

2-----–

–

Rb 2Re+----------------------------------------------------------- id max,= = =

Imax Re 0=Io

vbe

vd

2-----–

Rb------------------------.–=

(C-5)

(C-6)

168

Figure C-5 Current limiting effects of bypass resistorThe bypass resistor prevents current from being completely shut off in adifferential branch. The maximum current allowed to flow divided bythe total current is called the maximum current fraction.

The next step is to examine how the gain is affected by the addition of the bypass

resistor. The primary factor in the decrease in the transconductance is because of the

decrease in collector current in the differential pair. Gain is directly related to

transconductance and emitter resistance. A second order effect results from an increase of

voltage, and current, across the bypass resistor when collector current increases through the

emitter resistor.

Solving for the gain can be broken up into separate pieces: how the emitter current

changes relative to the input voltage, and how the total current changes relative to the

emitter current.

shows this relationship. The next step is to solve for the bypass current relative to the

emitter current

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40

Bypass Resistor (kΩΩΩΩ -µµµµm)

Max

imum

Cur

rent

Fra

ctio

n

Simulated Re=0

Analytical Re=0

Simulated Re=400

Analytical Re=400

400 ΩΩΩΩ -µµµµm

0 ΩΩΩΩ -µµµµm

Vd=400 mV

didv------

die

dv------- di

die

-------⋅= (C-7)

dib

die

------- ddie

-------vbe ie1Re+

Rb---------------------------

Re

Rb------.= = (C-8)

169

Since the sum of the bypass current and the emitter current is the total current i, then

it is possible to find the total current relative to the emitter current

Next, the emitter current relative to the other parameters is determined

From (C-1) on page164 the derivative of emitter current to input voltage is the inverse of

sum of the emitter resistances, and (A-1) on page 157 yields the transconductance. Using

(C-7), (C-9), and

and simplifying the equation yields the desired result

where id and vd are the differential current and differential voltage, respectively. Results

from this analysis compared to simulated results are shown in Fig. C-6.

The top plot in Fig. C-6 shows an upward slope as Rb is increased and increases the

transconductance. The lower plot shows a very flat response because the gain, in this case,

is fixed by the emitter resistor and is not affected by the collector current. (see Appendix

C.1. on page164).

didie

-------dib

die

-------die

die

-------+ReRb------ 1+= = (C-9)

ie

Rb Io 2vbe–

2Re 2Rb+----------------------------.= (C-10)

die

dv------- 1

re Re+----------------- 1

γvT

2Re 2Rb+

Rb Io 2vbe–---------------------------- Re+

--------------------------------------------------= = (C-11)

didv------

did

dvd

-------- 12γvTRb

Rb Io 2vbe–---------------------------- Re Rb

||+

-----------------------------------------------------= = (C-12)

170

Figure C-6 Current gain effects of bypass resistorThe bypass resistor lowers the current through the differential pair,which in turn decreases the transconductance, subsequently decreasingthe gain.

Fig. C-7 is a surface plot showing the relationship between current gain and emitter

and bypass resistors. This can be useful when designing a linearized differential amplifier

with bypass resistors.

Figure C-7 Designing for gain with emitter and bypass resistorsThis plot is useful for designing with bypass resistors when gain isspecified.

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

0 5 10 15 20 25 30 35 40

Bypass Resistor (kΩΩΩΩ -µµµµm)

Gai

n (m

A/V

/m

)

Simulated Re=0 ohm-um

Analytica Re=0 ohm-um

Simulated Re=400 ohm-um

Analytical Re=400 ohm-um

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 400

100

200

300

400

500

600

700

800

Gain (mA/V/µµµµm)

Bypass Resistor (kΩΩΩΩ-µµµµm)

Em

itter

Res

isto

r (

-m

)

0-1 1-2 2-3

3-4 4-5 5-6

6-7 7-8 8-9

1

2

3

4

171

C.3. Increasing CML delay

It is sometimes necessary to increase the delay of a CML gate to meet certain timing

requirements. Such a need is found in a ring oscillator that must be centered at a frequency

that is lower than the free running frequency. The addition of a capacitor across the level 1

outputs degrades the rise time, and thus, increases the gate delay. This solution is easy to

implement and simple to model.

Figure C-8 Collector CapacitorA collector capacitor can be used to degrade the delay through a CMLgate by increasing the rise time.

Modeling the new gate delay first involves determining the gate delay without the

capacitor. This nominal delay is represented by To, and is approximately equal to 12.5 ps.

The extra delay is modeled as a RC charging circuit with a time constant of 2RcCc. The

factor of 2 arises from the equivalent circuit shown in Fig. C-8, where two series capacitors

have a value of twice the original. An additional factor of ln(2) multiplies the time constant

to account for the point at which the output is considered switched. This point is

which is approximately when the differential voltage is 0 V. The total delay is equal to

2Cc

Rc RcRc

Cc

vo To I+ oRce

t–RcCc-----------------

= (C-13)

T To 2( ) 2RcCc( ).ln+= (C-14)

172

Figure C-9 Delay Model with Collector CapacitorThe delay of a CML gate versus level 1 capacitance is derived in thissection and is consistent with simulated results.

0

50

100

150

200

250

300

350

0 100 200 300 400 500 600

Capacitance (fF/µµµµm)

Gat

e D

elay

(ps

)

Analytical

Simulated

173

Appendix D. Sizing Transistors to Minimize VCO Delay

The design of digital logic gates in SiGe technology always includes a consideration

of transistor size. Sizes range from an emitter length of 1 µm to a length of 20 µm and if

multiple fingered emitters are used, effective lengths up to 40 µm. Usually, the larger

transistors have smaller delay, but consume proportionally higher current. A trade-off

decision among power, layout space, and delay specifications needs to be made.

Logic gates can be extremely varied and may include such functions as multiplexed

XOR, and five input AND/OR cells. Delays through each of these wil l depend on the

number of inputs and outputs, the input and output levels and various other factors. An

in-depth analysis of all these factors would be very complicated, and the results diff icult to

utilize. A more general solution, and the one followed in this appendix, is to consider

simple buffers with emitter followers driving other buffers. Although not a completely

accurate representation of most logic gates, the analysis conclusions are very useful in the

design of all gates. If a buffer is driving multiple receivers, this condition is reduced to a

case with only one receiver whose size is equal to the sum of the receivers. For instance, if

a driver has four 1 µm loads, they can be treated as one 4 µm load.

Also worth noting, is that the following analysis is extremely useful in the

optimization of ring oscil lators. These circuits incorporate a ring of two or more buffers that

oscill ate because of an odd number of inversions, and are very sensitive to gate delays. If a

buffer has a delay of 25 ps, then a 1-2 ps difference in delay can have a 4% or greater impact

on the final oscillation frequency. Consideration of the type of loads that wil l be driven by

the VCO is also important when choosing device sizes. For instance, if the VCO has buffers

with 1 µm devices, then a 1 µm load on each stage wil l introduce a proportionally huge

loading effect on the system.

The assumption in this analysis is that the receiver circuit is fixed and design work

will be done on the driver. The data presented here, however, can be useful for the design

of the receiver as well .

174

Figure D-1 Delay from emitter follow to differential amplifierIn general the larger the emitter foll ow the more capable it is at drivinglarger differential amplifiers. A rule of thumb in designing an emitterfoll ower to minimize delay and not use considerable power is to use 2µm devices plus 1 µm per 5 µm of load.

Fig. D-1 shows the effect on the delay of using different sized emitter followers to

drive various receiver loads. The larger the emitter follower, the smaller the delay since the

higher powered follower has a lower output resistance. This, coupled with the receiver

input base capacitance, produces a smaller delay. The figure also shows the acceleration in

delay as the receiver size remains fixed and the emitter follower shrinks. The acceleration

occurs because delay is inversely proportional to output resistance.

Also shown on this plot, are design points which establish a good rule of thumb for

designing emitter followers based on receiver loading for less criti cal gates. Obviously, the

largest emitter followers used will yield the smallest delay, but there is a point were larger

devices do not yield substantial improvement. The design rule is to use followers of at least

2 µm and add an additional 1 µm per 5 µm of load. Following this rule yields very small

delays without huge power consumption

1 2 3 4 5 6 7 8 9 10

13

57

9

Delay (ps)

Emitter Fol lower Size (µµµµm)

Rec

eive

r S

ize

(m

)

10-12 12-14 14-16 16-18 18-20

20-22 22-24 24-26 26-28 28-30

Design Points

emitter follower

delay

amp

175

.

Figure D-2 Delay from differential amp to emitter followerDesigning CML logic gates often requires designing an emitter followerstage. The choice of follower is based on many factors, including thespecific differential amplifier driving the followers. In general, thelarger the follower, compared to the ampli fier, the larger the delaythrough the gate.

After choosing an emitter follower, the next step is to design the differential amplifier

that represents the core of the driver. Fig. D-2 shows the delay from the amplifier to the

emitter follower, given different sizes of each. Here the effect is opposite from the effect

demonstrated in the previous section; a larger follower size now increases the delay. This

is because the followers are now acting as loads on the amplif ier and the larger transistors

add base capacitance. The ideal situation would be to have the smallest emitter followers

possible, but this is not an option after considering loading effects. A good rule is to use an

ampli fier that is at least half the size of the emitter followers. This yields good delay and

driving properties.

From Fig. D-1 and Fig. D-2, it is clear that a trade-off exists when designing an

emitter follower to be placed between two differential ampli fiers. An increase in follower

size allows for a better abilit y to drive loads, however, this increase inhibits the ability of

the first ampli fier to drive the follower. A closer look at this situation yields Fig. D-3, which

1 2 3 4 5 6 7 8 9 10

12

34

56

78

910

Emii ter Fol lower Size (µµµµm)

Am

p S

ize

(m

)

7-8 8-9 9-10 10-11 11-12

12-13 13-14 14-15 15-16Design Points

emitter follower

delay

amp

176

shows the optimum follower size to use, given a driver and receiver ampli fier size. For

instance, in a ring oscil lator with 2 µm buffers each driving a 1 µm load, the optimal

follower to use is about 6 µm in size. From Fig. D-4 we find that the delay through the gate

will be about 23 ps.

Figure D-3 Size of emitter follower between driver and receiverWhen a gate needs to drive another gate on level 2 or lower, or when thereceiver is a large load, emitter followers are used. The optimaltransistor size to minimize delay through the driver and receiver gates,is a function of the transistor sizes in the driver and the receiver.

Ring oscillators typically have a buffer of size x driving the next buffer, and a load.

Minimizing and balancing the external loading on each buffer forces each stage to have 1

µm buffers hanging on it. For standard ring VCOs, an emitter follower design line exists.

This is shown on Fig. D-3 and Fig. D-4. For the feed forward VCO, each stage of size x

must drive two inputs of size x, yielding a different design curve.

The final step is to justify the use of the emitter follower. Since it adds delay to the

buffer-follower-buffer system, it may be better (less delay) to remove the follower

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

Receiver (µµµµm)

Driv

er (

m)

18-20

16-18

14-16

12-14

10-12

8-10

6-8

4-6

2-4

4

6

8

10 12

Ring VCO design points

Feed Forward VCO design points

177

completely. Fig. D-5 shows the difference in delay between a system with and without an

emitter follower. In almost all instances it is beneficial to include the follower unless the

receiver is much smaller then the driver.

Figure D-4 Delay when using optimized emitter followerThe plot above shows the minimum delay achievable between twodifferential amplifiers when using an optimized emitter follower.

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

Receiver (µµµµm)

Driv

er (

m)

34-35

33-34

32-33

31-32

30-31

29-30

28-29

27-2826-27

25-26

24-25

23-24

22-23

21-22

20-21

20

21

22

23

24

25

Ring VCO design points

Feed Forward VCO design points

178

Figure D-5 Delay difference between circuit with follower and circuit withoutAn emitter follower between differential amplifier introduces additionaldelay, but in most cases reduces the overall delay of the system. Only incases with large drivers and smaller receivers does the emitter foll owerincrease the delay.

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

Receiver (µµµµm)D

river

(m

)

8.0-10.0

6.0-8.0

4.0-6.0

2.0-4.0

0.0-2.0 -2.0-0.0

179

Appendix E. SpectreHDL models

E.1. FFI VCO

// Spectre AHDL for FFI VCO 4u, ahdl//// This cell emulates the functioning of the FFI VCO.// It has 4 sine wave outputs each offset from each other// by 45 degrees. Additional outputs give the instantaneous // frequency and the phase relative to a fixed frequency// source//// Thomas Krawczyk 7/00//#define PI 3.1415926535module b_ffi5 ( w20, w21, x20, x21, y20, y21, z20, z21, Vref, s30, s31) (fc,offset,divider,mfreq)

node [V,I] w20; node [V,I] w21;node [V,I] x20; node [V,I] x21;node [V,I] y20; node [V,I] y21;node [V,I] z20; node [V,I] z21;node [V,I] s30; node [V,I] s31;node [V,I] phase; node [V,I] freq;node [V,I] Vref;

// Center frequency with 0 control voltageparameter real fc = 5.96G ;// DC voltage offset on terminal outputsparameter real offset = -1.1 ;// In PLL encorporate 1/8, 1/16 divider into modelparameter real divider = 1 from (0.25:64);// Frequency with which to compare and determine phase offsetparameter real mfreq = 5 GHz;

table VCOdata;real control_voltage, f;real s[11], factor[11];

initial // Mapping data between input control voltage and output frequency collected// from simulation. Must be positive so a 450m offset is introduced.

s[0] = 0.500; factor[0] = 0.733; s[1] = 0.600; factor[1] = 0.733; s[2] = 0.700; factor[2] = 0.747; s[3] = 0.800; factor[3] = 0.805;s[4] = 0.850; factor[4] = 0.849; s[5] = 0.900; factor[5] = 0.896;s[6] = 0.950; factor[6] = 0.950; s[7] = 1.000; factor[7] = 1.000;s[8] = 1.050; factor[8] = 1.046; s[9] = 1.100; factor[9] = 1.091;s[10]= 1.150; factor[10]= 1.134; s[11]= 1.200; factor[11]= 1.168;s[12]= 1.300; factor[12]= 1.218; s[13]= 1.400; factor[13]= 1.230;s[14]= 1.500; factor[14]= 1.230;VCOdata = $build_table(2, factor, s, 11);

analog control_voltage = V(s31,s30) + 450m;// Find the frequency multiplier from the control voltagef = $interpolate(VCOdata, control_voltage);// Find the phase of the w20 phaseph = 2*PI*integ(fc*f/divider,0);// Find the phase of the signal whose frequency is being used for phase differencemph= 2*PI*integ(mfreq,0);// Generate the signals for each phase outputV(w20) <- offset + sin(2*PI* integ(fc*f/divider,0) );V(w21) <- offset - sin(2*PI* integ(fc*f/divider,0) );V(x20) <- offset + sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );V(x21) <- offset - sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );V(y20) <- offset + sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );V(y21) <- offset - sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );V(z20) <- offset + sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );V(z21) <- offset - sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );

180

// Return the phase difference in degreesV(phase) <- (ph-mph)/PI*180;// Return the exact frequency in GHzV(freq) <- fc*f/divider/1G;

E.2. 3-State PD

// Spectre AHDL for SERDES3, PD_3state, ahdl// // This module emulates the 3-state Phase Detector. // It looks for rising transtions of the vi and vo inputs// and forces the output to a +1 or -1 state depending on// which input went high. When both eventually go high the// output is reset. The slip outputs although not implemented// give a pulse when the detector exceeds is max value.// // Thomas Krawczyk 9/27/00//

module PD_3state ( vd0, vd1, vi_slip10, vi_slip11, vo_slip10, vo_slip11, Vref1, Vref2, vi20, vi21, vo20, vo21) ()

node [V,I] vd0; node [V,I] vd1; node [V,I] vi_slip10; node [V,I] vi_slip11; node [V,I] vo_slip10; node [V,I] vo_slip11; node [V,I] Vref1; // Can ignore node [V,I] Vref2; // Can ignore node [V,I] vi20; node [V,I] vi21; node [V,I] vo20; node [V,I] vo21; real vo_center = -1.07; // Center output voltage real vo_swing = 144m; // Swing either high or low real i_rise = -1; // 0 = low 1 = transition 2 = high real o_rise = -1; real out0, out1; analog // Make sure we get a time point at the input crossings. $threshold( V(vi20)-V(vi21), 1 ); $threshold( V(vo20)-V(vo21), 1 ); if( V(vi20) > V(vi21)) if( i_rise < 2 ) i_rise++; else i_rise = 0; if( V(vo20) > V(vo21)) if( o_rise < 2 ) o_rise++; else o_rise = 0; // input vi positive transition? if( i_rise == 1 && o_rise == 0 ) out0 = vo_center + vo_swing; out1 = vo_center - vo_swing; // input vo position transition? if( i_rise == 0 && o_rise == 1 ) out0 = vo_center - vo_swing; out1 = vo_center + vo_swing; // Both transitions detected // reset output back to nominal values if( i_rise >= 1 && o_rise >= 1 ) out0 = out1 = vo_center; if( i_rise == -1 && o_rise == -1 ) out0 = out1 = vo_center;

181

// Give the output signals a rise time and 3 gate delays V(vd0) <- $transition( out0, 60p, 20p, 20p ); V(vd1) <- $transition( out1, 60p, 20p, 20p ); // Frequency slip detectors are not implemented V(vi_slip10) <- -1.5; V(vi_slip11) <- -1.5; V(vo_slip10) <- -1.5; V(vo_slip11) <- -1.5;

E.3. Transition Detector PD

// Spectre AHDL for SERDES3, RxEdgeExtraction, ahdl// This is a model for the Transistion Phase Detector circuit. // Clock inputs are w2 - z2.// Data inputs are dw1 - dz1.// Sampled outputs are da2 - dd2.// Fast and slow commands to the VCO are f20 and s21.// // Each region is 25 ps wide.// // \2|1/ // 3 \|/ 0// ---+---// 4 /|\ 7// /5|6\

module RxEdgeExtraction ( da20, da21, db20, db21, dc20, dc21, dd20, dd21, f20, s21, dw10, dw11, dx10, dx11, dy10, dy11, dz10, dz11, w20, w21, x20, x21, y20, y21, z20, z21, region) ()

node [V,I] da20; node [V,I] da21; node [V,I] db20; node [V,I] db21; node [V,I] dc20; node [V,I] dc21; node [V,I] dd20; node [V,I] dd21; node [V,I] f20; node [V,I] s21; node [V,I] dw10; node [V,I] dw11; node [V,I] dx10; node [V,I] dx11; node [V,I] dy10; node [V,I] dy11; node [V,I] dz10; node [V,I] dz11; node [V,I] w20; node [V,I] w21; node [V,I] x20; node [V,I] x21; node [V,I] y20; node [V,I] y21; node [V,I] z20; node [V,I] z21; node [V,I] region; // AHDL output of the current sampling region integer reg = 0; // 1-8 (0-45 = 0) integer out[8]; // output array of detected transitions // per region to be summed at end integer sum; // sum of output array integer i; // index for summing loop integer da, db, dc, dd; // Sampled outputs (0,1) map to (-1, 1) integer data_val; // Last data value real out_center = -1; // center of fast/slow output real out_diff = 4m; // fast/slow differential output / edge real data_center = -1.1;// Center of sampled data output real data_amp = 150m; // Amplitude of sampled data output analog if( V(w20) > V(w21) && reg == 7 ) if( V(dw10) > V(dw11) ) da = 1; else da = -1; reg = 0; out[reg] = 0; if( V(x20) > V(x21) && reg == 0 ) reg = 1; out[reg] = 0; if( V(y20) > V(y21) && reg == 1 ) if( V(dw10) > V(dw11) ) db = 1; else db = -1; reg = 2; out[reg] = 0; if( V(z20) > V(z21) && reg == 2 ) reg = 3; out[reg] = 0;

182

if( V(w20) < V(w21) && reg == 3 ) if( V(dw10) > V(dw11) ) dc = 1; else dc = -1; reg = 4; out[reg] = 0; if( V(x20) < V(x21) && reg == 4 ) reg = 5; out[reg] = 0; if( V(y20) < V(y21) && reg == 5 ) if( V(dw10) > V(dw11) ) dd = 1; else dd = -1; reg = 6; out[reg] = 0; if( V(z20) < V(z21) && reg == 6 ) reg = 7; out[reg] = 0; // Look for transitions and insert // 1 into output array of current region if( (V(dw10) > V(dw11)) && data_val == 0 ) out[reg] = 1; data_val = 1; if( (V(dw10) < V(dw11)) && data_val == 1 ) out[reg] = 1; data_val = 0; // Sum the fast/slow regions sum = -out[0]+out[1]-out[2]+out[3]-out[4]+out[5]-out[6]+out[7]; V(da20) <- data_center + da*data_amp; V(da21) <- data_center - da*data_amp; V(db20) <- data_center + db*data_amp; V(db21) <- data_center - db*data_amp; V(dc20) <- data_center + dc*data_amp; V(dc21) <- data_center - dc*data_amp; V(dd20) <- data_center + dd*data_amp; V(dd21) <- data_center - dd*data_amp;

V(f20) <- $transition(out_center + out_diff/2*sum, 50p, 20p, 20p); V(s21) <- $transition(out_center - out_diff/2*sum, 50p, 20p, 20p); V(region) <- reg;

E.4. Histogram generator

// Spectre AHDL for SERDES3, histogram, ahdl// This cell allows the plotting of a histogram of voltages.// It samples the "vin" signal and places it in one of "bins" bins.// The "sweep" output signal sweeps across all bins while the "plot"// output shows the current value of that bin. // To create the histogram simply set "sweep" as the x axis and // "plot" as the y axis.// // Thomas Krawczyk 9/26/00// module histogram ( plot, sweep, vin, mean, rms) ( bins, low_v, high_v, begin ) node [V,I] plot; node [V,I] sweep; node [V,I] vin; node [V,I] mean; node [V,I] rms; parameter real bins = 16 from (1:1025); parameter real low_v = 0; parameter real high_v = 1; parameter real begin = 1n from (0:inf); integer bin[1024]; integer index; integer s=0; // Current sweep index integer count=0; // Total samples real range; // Difference between low_v and high_v real mu, sigma; // Mean and standard deviation

183

real sum, sq_sum;// The sum and the sum of square samples initial range = high_v-low_v; analog if( $time() > begin ) count++; sum += V(vin); sq_sum += V(vin)*V(vin); mu = sum/(1.0*count); sigma = sqrt(( sq_sum - 2*mu*sum + count*mu*mu )/(1.0*count)); index = (V(vin)-low_v)/range * bins; if( index >= 0 && index < bins ) bin[index]++; s++; if (s == bins) s=0; V(mean) <- mu; V(rms) <- sigma; V(sweep) <- low_v + s/(1.0*bins)*range; V(plot) <- bin[s];

E.5. Jittered data source

// Spectre AHDL for PeteExp, datasource, ahdl

# define PI 3.1415926535#define getbitnum(t) floor(t*Bps)

module datasource ( d0, d1, sweep, Jout ) (Offset, Vmag, Bps, Sigma) node [V,I] d0; node [V,I] d1; node [V,I] sweep; node [V,I] Jout; parameter real Offset=-1.50e-1; parameter real Vmag=-1.50e-1; parameter real Bps=2.0e10 from (0:inf); parameter real Sigma=1.0e-11; // Local Variables integer bitnumber,newbitnumber,cbnum,cbval; real jitter; real c_0,c_1,c_2,d_1,d_2,d_3,T,X,p; initial bitnumber=0; newbitnumber=0; jitter=0.0; c_0 = 2.515517 ; c_1 = 0.802853 ; c_2 = 0.010328 ; d_1 = 1.432788 ; d_2 = 0.189269 ; d_3 = 0.001308 ; analog newbitnumber=getbitnum($time()); // time*bps, but want the fractions V(sweep) <- ($time()*Bps - newbitnumber); if (newbitnumber!=bitnumber) bitnumber=newbitnumber; // Create jitterval for this new bit jitter=$random(); if (jitter<=0.5) p=jitter; else p=1.0-jitter; T = sqrt( ln(1.0/(p*p)) ); X = T-(c_0 + c_1*T + c_2*(T*T))/(1 + d_1*T + d_2*(T*T) + d_3*(T*T*T)); if (jitter>0.5)

184

jitter=-1.0*X*Sigma; else jitter=X*Sigma; $break_point((1.0+newbitnumber)/Bps+jitter); V(Jout) <- jitter; // Get possibly current different bit number cbnum=floor(($time()+jitter)*Bps); // Convert bit number to bit value cbval=cbnum % 2; V(d0) <- $slew(Offset-Vmag*(2*cbval-1),3.0e10,-3.0e10); V(d1) <- $slew(Offset+Vmag*(2*cbval-1),3.0e10,-3.0e10);

185

Appendix F. Toplevel Chip Schematics

F.1. Serdes I Transmitter

186

F.2. Serdes I Receiver

187

F.3. Serdes II Tranciever

circuits for the design of a serial communication system

Documents