low-power, gigahertz clock generation and distribution
TRANSCRIPT
Low-Power, Gigahertz Clock Generation and
Distribution using Injection-Locked Oscillators
by
Lin Zhang
Submitted in Partial Fulfillment
of the
Requirements for the Degree
Doctor of Philosophy
Supervised by
Professor Hui Wu
Department of Electrical and Computer Engineering
Arts, Sciences and Engineering
Edmund A. Hajim School of Engineering and Applied Sciences
University of Rochester
Rochester, New York
2010
ii
Curriculum Vitae
Lin Zhang was born in Luzhou, Sichuan Province, China on Feb. 26, 1982. He
attended Tsinghua University, Beijing, China from 2000 to 2004, and graduated with
a Bachelor of Science degree in 2004. He came to the University of Rochester in the
Fall of 2004 and began graduate studies in Electrical Engineering. He pursued his
research in RF and analog integrated circuits under the direction of Professor Hui
Wu and received the Master of Science degree from University of Rochester in 2006.
He received Zeng Xianzhang Scholarship at Tsinghua University in 2001 and Frank
J. Horton Research Fellowship at University of Rochester in 2006.
iii
Acknowledgment
I would like to thank a number of people and organizations that have helped me
through my research at University of Rochester and in my life to date. First and
foremost, I am heartily thankful to my advisor, Prof. Hui Wu, whose encouragement,
guidance and support from the initial to the final level enabled me to develop an un-
derstanding of the subject and make this thesis possible. His knowledge and technical
excellence is beyond any doubt. What’s more important to me is his patience and
professionalism in teaching me through every technique details of the Ph.D. research.
I feel very honored and privileged to have worked with him.
I would like to extend my sincere gratitude to Prof. Michael Huang. Without his
help and guidance on the ILC project, this thesis would not have been possible. I
would also like to thank Prof. Eby Friedman, for being in my academic committee,
inspiring me on my research direction, and organizing the excellent IC design research
meeting, which helped a lot on my presentation skills. I would also like to thank Prof.
Sandhya Dwarkadas, Prof. Wendi Heinzelman, Prof. Robert Waag, Prof. Philippe
Fauchet and Prof. Thomas B. Jones for their valuable helps on my classes, lab works
and oral exams.
I am deeply indebted to all of my groupmates in Prof. Wu’s Laics group, Dr.
Yunliang Zhu, Berkehan Ciftcioglu, Jianyun Hu, David Karasiewicz, Shang Wang,
Jian Zhang and Jie Xu. Without their helps, discussions and friendships, my research
work would have been much more difficult than it was. I would also like to thank
iv
Alok Garg and Aaron Carpenter in Prof. Huang’s group, for their cooperation on the
ILC project. I would like to thank all my friends I met in Rochester, Fu Bo, Yu Chao,
Li Xin, Yu Qiaoyan, Zhang Xiaohua, Sun Qiang... Their friendships have made my
life colorful, even in Rochester’s long white winter.
During my internship at Rambus during the summer of 2008, Id like to acknowl-
edge those I worked with there. My manager Nhat Nguyen, my mentor Yohan Frans,
and all the colleagues Ting Wu, Brian Leibowitz, Marko Aleksic and Fred Lee.
I would like to thank Laboratory for Laser Energetics for the support through
Frank J. Horton research fellowship. I would like to thank Bijoy Chatterjee, Ah-
mad Bahai, Peter Holloway, Mounir Bohsali, Johnny Yu, Anish Shah, Virginia Abellera,
Peter Misich, and Jun Wan of National Semiconductor for their help and support in
chip fabrication.
Finally, I would like to thank my wife, my parents and my sister for their uncon-
ditional love, support and encouragement in my life. No matter where I was, they
were always with me, so this thesis is dedicated to my family.
v
Abstract
The generation and distribution of high speed and high quality clock signals have
become increasingly important in high performance microprocessors, wireline com-
munications and wireless communications. In multi-gigahertz frequency range, con-
ventional clocking techniques have encountered several design challenges in terms
of power consumption, skew and jitter. Injection-locking is a promising technique
to address these design challenges for gigahertz clocking. This dissertation presents
our studies of gigahertz, high performance, low power clock generation and distri-
bution using injection-locked frequency dividers (ILFDs), injection-locked frequency
multipliers (ILFMs) and injection-locked clock distribution networks (ILCs). Chip
prototypes in 0.18µm standard digital CMOS technologies are demonstrated for the
following gigahertz clocking circuits.
For gigahertz clock generation, we introduced a phase tuning scheme for an ILFD-
based dual-phase signal generator. The phase tuning capability in this scheme comes
from the tunable phase transfer characteristics of injection-locked frequency dividers.
Implemented with a frequency-tunable double-balanced divide-by-two injection-locked
frequency divider, the dual-phase signal generator prototype achieves 100o differential
phase tuning range around quadrature with generated signal frequency of 5 GHz.
For gigahertz frequency division, we introduced a divide-by-odd-number injection-
locked frequency divider to address the division ratio limitation of conventional injection-
locked frequency dividers. With differential injection and harmonic filtering, this new
vi
ILFD topology maintains the fully differential nature of the output signal, while at
the same time achieving effective mixing between the injected odd harmonics and
output oscillation. 5% locking range without frequency tuning is achieved for the
circuit prototype of this topology working at input frequency of 16-18 GHz.
For gigahertz frequency multiplication, we introduced an injection-locked oscil-
lator to work as a high-gain, high-Q harmonic filter for conventional harmonic-
generation-and-filtering frequency multipliers. This new approach achieves signif-
icantly better undesired harmonic suppression for frequency multipliers built with
lossy digital CMOS processes. Frequency tunability of injection-locked oscillators also
enables multi-mode operations for such injection-locked frequency multipliers. The
circuit prototype of such a frequency multiplier achieves multiply by 2 and 3 dual-
mode operation with undesired harmonic suppressions better than 30 dB achieved for
both modes.
For gigahertz clock distribution, we proposed an injection-locked clocking scheme
using injection-locked oscillators (ILOs) as the local clock regenerators. Because of
an ILO’s capability to be locked by a small input signal, this new approach reduced
a large amount of clock buffers in global clock distribution. This not only reduces
the power consumption, but also reduces the skew and jitter which come from these
clock buffers. The phase tunability of ILOs can also be utilized to achieve the deskew
function between different clock domains. Three circuit prototypes of ILCs working
at several gigahertz have been built. They demonstrated better power and jitter
performance together with the built-in deskew capability of ILCs.
vii
Contents
Curriculum Vitae ii
Acknowledgment iii
Abstract v
List of Tables x
List of Figures xi
1 Introduction 1
1.1 Clocking in Communications and Computing Systems . . . . . . . . . 2
1.2 Challenges for High Performance Clocking . . . . . . . . . . . . . . . 9
1.2.1 Skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.4 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . 21
2 High Performance Clocking 23
2.1 Clock Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
viii
2.1.2 Phase-Locked Loop (PLL) . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Injection-Locked Oscillators (ILOs) . . . . . . . . . . . . . . . 31
2.1.4 High Speed Frequency Dividers and Multipliers . . . . . . . . 34
2.2 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.1 Conventional Monolithic Clock Distributions . . . . . . . . . . 46
2.2.2 Emerging Gigahertz Clock Distribution Schemes . . . . . . . . 48
2.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . 51
3 Injection-Locked Oscillators for Clock Generation 54
3.1 Analysis of Injection-Locked Oscillators . . . . . . . . . . . . . . . . . 54
3.1.1 ”Harmonic Balance” Analysis of Oscillators . . . . . . . . . . 56
3.1.2 ”Harmonic Balance” Analysis of Injection-Locked Oscillators . 58
3.1.3 Common-Mode and Differential Injection . . . . . . . . . . . . 59
3.1.4 Differential Injection for Odd-Harmonic and Fundamental In-
jection Locking . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Injection-Locked Frequency Dividers (ILFDs) . . . . . . . . . . . . . 66
3.2.1 Divide-by-Two ILFD for Dual-phase Signal Generation . . . . 70
3.2.2 Divide-by-Odd-Number ILFD . . . . . . . . . . . . . . . . . . 80
3.3 Injection-Locked Frequency Multipliers (ILFMs) . . . . . . . . . . . . 89
3.3.1 Dual Modulus ILFM with Good Harmonic Suppression . . . . 89
4 Injection-Locked Clock Distribution 101
4.1 Injection-Locked Clocking . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.1 Power Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.2 Skew Reduction and Deskew Capability . . . . . . . . . . . . 104
4.1.3 Jitter Reduction and Suppression . . . . . . . . . . . . . . . . 106
4.1.4 Potential Applications . . . . . . . . . . . . . . . . . . . . . . 107
4.2 Architecture Level Evaluation of ILC Power Impact . . . . . . . . . . 108
ix
4.3 ILC Circuit Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.3.1 Prototype I: ILC with Divide-by-two ILOs . . . . . . . . . . . 115
4.3.2 Prototype II: ILC with Non-division ILOs . . . . . . . . . . . 120
4.3.3 Prototype III: ILC with Active Deskewing . . . . . . . . . . . 129
5 Future Work and Conclusions 139
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.1.1 Injection-Locked Clock and Data Recovery . . . . . . . . . . . 139
5.1.2 Injection-Locked Free-Space Optoelectronic Oscillators . . . . 141
5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Bibliography 146
x
List of Tables
3.1 Performance Comparison with Other Works . . . . . . . . . . . . . . 100
xi
List of Figures
1.1 Speed of wireline communication standards increases with time. For
long distance wireline communication standards, synchronous optical
networking (SONET) and synchronous digital hierarchy (SDH), the
speed has increased from 51.84Mb/s in 1988 to the current large scale
deployment of 10Gb/s and 40Gb/s systems. Speed for short distance
wireline communication standard Ethernet increased from only 3Mbps
in 1975 to current speed of 10Gb/s. . . . . . . . . . . . . . . . . . . . 3
1.2 Commercial wireless systems are expanding from sub- and low-gigahertz
range to multi-gigahertz, even millimeter-wave range in order to sup-
port higher data rates. . . . . . . . . . . . . . . . . . . . . . . . . . . 4
xii
1.3 The role of clock signal in a typical wireline receiver. The highlighted
portion in (a) is the clocking circuit. The clock signal is first frequency
synchronized with that of the incoming data stream by a reference
clock fref , which is shared by both transmitter and receiver. The
frequency synchronized clock is then phase synchronized with the data
stream, and used to sample the data. (b) The example of good and
bad sampling clocks. The good sampling clock, clock A has a sampling
transition edge at the center of the data stream eye diagram. The bad
sampling clock, clock B has a sampling edge at the edge of the data
stream eye diagram. A sampling edge away from the eye diagram
center increases the chance of error of the system. . . . . . . . . . . . 5
1.4 The role of clock signal in wireless communications systems. (a) shows
a typical radio transceiver for wireless communications. The high-
lighted portion is the clock generator for wireless systems, the fre-
quency synthesizer. The output of frequency synthesizer up converts
the transmitted signal from low frequency to radio frequency, and down
converts the received signal from radio frequency to low frequency.
(b) If there is any frequency error for the synthesizer output, the up-
converted signal may overlap with adjacent channels in radio frequency
and cause errors for both the transmission channel and adjacent channels. 6
1.5 On-chip and off-chip clock frequencies, IO bandwidth and number of
cores for the processors in the near future predicted by International
Technology Roadmap for Semiconductors (ITRS) 2007. According to
the prediction, the on-chip clock frequency will reach near 9 GHz in
2015. The off-chip clock frequency will reach 30 GHz and IO bandwidth
will reach 80 Tb/s. Number of cores in 2015 will be around 10. . . . . 8
xiii
1.6 Clock generation and distribution in a processor chip. The clock gen-
eration PLL generates gigahertz, frequency tunable clock signal for
the entire VLSI chip. After the clock generator, a clock distribution
network delivers the clock signal to every logic gates across the chip. . 9
1.7 Effect of skew on (a) long path error and (b) short path error. . . . . 11
1.8 Illustration of clock jitter as the clock period variations. The period of
clock with jitter is a Gaussian distribution with the mean value equal
to the ideal clock period Tavg. . . . . . . . . . . . . . . . . . . . . . . 12
1.9 (a) Typical phase noise profile of an oscillator versus offset from carrier.
The corner frequencies ∆ω 1
f3
and ∆ω 1
f2
represent the boundaries be-
tween 30dB/dec, 20dB/dec and flat regions of the phase noise profile.
(b) Illustration of the process of reciprocal mixing. . . . . . . . . . . 16
1.10 (a) A typical LC oscillator, and (b) the model for phase noise analysis.
The active circuit part provides a negative resistance to compensate
the loss of the LC tank. . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.11 (a) The linearized noise model for a typical type II PLL, and (b) a
typical PLL phase noise profile. . . . . . . . . . . . . . . . . . . . . . 18
2.1 Direct frequency generation using the mix-and-divide principle. It re-
quires excessive filtering. . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Typical integrated oscillators. (a) a ring oscillator and (b) an LC
oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Barkhausen oscillation criteria for LC oscillators. (a) an LC differen-
tial oscillator composed by an LC tank, represented by H(jω), and
active circuit, represented by f(Vo); (b) feedback loop model for the
LC oscillator; (c) amplitude and phase response of the LC tank H(jω). 27
xiv
2.4 A basic phase-locked loop, where PFD is the phase and frequency
detector, CP is charge pump, LPF is low pass filter and VCO is voltage
controlled oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 A typical PLL-based Integer-N frequency synthesizer. With the divide-
by-N frequency divider in the loop, the output clock frequency fout
equals to N × fin. The modulus selection is capable of changing the
value for the division ratio N . Thus the output frequency can be
changed with step size of the reference frequency fin. . . . . . . . . . 30
2.6 (a) Beat and injection locking phenomenon when an oscillator is driven
by a single-frequency input signal. (b) locking range. . . . . . . . . . 32
2.7 (a) A generic model of an injection-locked oscillator (ILO). (b) a divide-
by-two ILO based on a common differential LC oscillator. The input
signal is injected into the oscillator core through the tail transistor
Mtail. This topology exhibits good injection locking efficiency because
of the built-in single-balanced mixer structure. . . . . . . . . . . . . 33
2.8 Digital divide-by-two circuit and the implementation of the latch. . . 35
2.9 Dynamic CMOS dividers using (a) inverters (b) true single-phase clock
(TSPC) logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.10 Analog frequency dividers: (a) Miller frequency divider and (b) para-
metric frequency divider. . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.11 Injection-locked frequency divider (ILFD) implementations: (a) ring
oscillator ILFD, (b) Colpitts LC oscillator ILFD, (c) direct injection
into the tank of an LC differential LC oscillator, (d) injection through
the tail of the LC differential oscillator. . . . . . . . . . . . . . . . . 38
2.12 Frequency synthesis with LO and frequency multiplier in an RF trans-
mitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
xv
2.13 Frequency multipliers by (a) harmonic generation and filtering, (b)
regenerative harmonic doubling and (c) injection-locking. . . . . . . . 42
2.14 Injection-locked clocking with active deskew based on the ILO delay
tunings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1 Loop analysis of injection-locked oscillators. (a) loop model for an ILO,
where the LC tank is represented by H(jω), and the active circuit is
represented by f(Vi, Vo); (b) phasor representation of the phase shift
introduced active circuit; (c) amplitude and phase response of the LC
tank H(jω), showing the new oscillation frequency at ωi, instead of ω0. 55
3.2 One port model for a resonator-based oscillator with the active circuit
represented by a linear admittance. . . . . . . . . . . . . . . . . . . . 57
3.3 One port model for an injection-locked oscillator with the active circuit
described in time domain. . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 An injection-locked oscillator based on a cross-coupled LC differential
oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Locking range simulation of a divide-by-two ILFD as compared the
harmonic perturbation and hard switching models. . . . . . . . . . . 63
3.6 Realizations of divide-by-three ILFD by differential cascode injection. 65
3.7 Phase tuning characteristics for a divide-by-two ILO in Fig. 2.7-b.
η ≡ Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation
frequency, ∆ω ≡ ω − ω0 is the frequency shift, and Q is the LC tank
quality factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.8 (a) Schematic,(b) equivalent circuit model and (c) behavior model of
a divide-by-two ILFD based on differential LC oscillator. The non-
linearity in the behavior model comes from the switching of the cross
pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xvi
3.9 Dual phase generation by injection-locking two ILFDs. The injected
differential signals result in quadrature phase difference at the ILFD
outputs when ω01 = ω02. The output phases φ1 and φ2 are explicitly
expressed as the sum of the quadrature phases and the phase shift
parts ϕ1 and ϕ2 so that Eqn. 3.28 can be directly applied. . . . . . . 72
3.10 Phase tuning of two ILFDs: (a) quadrature; (b)(c) single-ended tuning;
(d) differential tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.11 Schematic of the prototype double-balanced ILFD for tunable dual-
phase signal generation. A first stage ILFD works as an active balun
to convert a single-ended input to differential signals, which are then
fed into the input of the double-balanced ILFD stage. . . . . . . . . . 74
3.12 Chip micrograph of the prototype ILFD. The chip occupies an area of
1.0 × 1.1mm2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.13 Frequency tuning range of the free-running ILFD core. . . . . . . . . 76
3.14 Locking range and bounds in the middle of the tuning range. Note
that these are the input signal frequencies, which are 4 times that of
the outputs. A maximum of 17% locking range was achieved . . . . . 77
3.15 Phase tuning: (a) keep Vt1 constant and tune Vt2; (b) keep Vt2 constant
and tune Vt1; (c) differential tuning of Vt1 and Vt2 at different injection
frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.16 Phase noise within the locking range, compared with that of input
signal. Measured at 5.3 GHz output. 12-dB phase noise reduction
comes from the divide by four operation. . . . . . . . . . . . . . . . 79
3.17 Circuit evolution of ILFD from divide-by-two to divide-by-three. The
most important change is from common mode injection and differential
mixing to differential injection and single-ended mixing. . . . . . . . . 80
xvii
3.18 Circuit model for the divide-by-three ILFD, with differential injection
at the sources of the cross pair. . . . . . . . . . . . . . . . . . . . . . 81
3.19 Loop model for the divide-by-three ILFD. . . . . . . . . . . . . . . . 82
3.20 Circuit implementation of the divide-by-three ILFD. An input balun
is used for single-end to differential conversion. . . . . . . . . . . . . 84
3.21 Output spectrum of the divide-by-three ILFD with 23-dB and 21-dB
of second and third harmonic suppression. The zoom in of the funda-
mental output shows a clean spectrum. . . . . . . . . . . . . . . . . 85
3.22 Locking range vs. injection power for the divide-by-three ILFD. A
maximum of 1-GHz locking range was achieved. . . . . . . . . . . . . 86
3.23 Locking range vs. injection voltage. The injection-voltage is calculated
by the incident power reading and the s11 at the input port, with cable
and connector loss calibrated out. . . . . . . . . . . . . . . . . . . . 86
3.24 Extended working frequency range which combines the frequency tun-
ing range and the locking range. . . . . . . . . . . . . . . . . . . . . 87
3.25 Phase noise performance vs. injection power. 9-dB phase noise reduc-
tion is because of the divide by three operation. . . . . . . . . . . . . 87
3.26 Chip micrograph of the divide-by-three ILFD, with a chip size of
0.9mm × 0.9mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.27 Schematic of the dual-modulus injection-locked frequency multiplier. . 91
3.28 (a) Harmonic current generation vs. input voltage for a zero biased
NMOS transistor harmonic generator. (b) Harmonic current ratio to
fundamental vs. input voltage for the same harmonic generator . . . 92
3.29 Model of the filter between harmonic generator and the ILO. The trans-
former model has neglected the parasitic inductance. . . . . . . . . . 93
3.30 Die photo of the dual-modulus frequency multiplier. Osc is the oscil-
lator core, T1 is the transformer and M1 is the harmonic generator. . 94
xviii
3.31 Locking ranges of doubler and tripler modes vs. input levels, which
determine the output frequency ranges of the frequency multiplier. . . 95
3.32 Harmonic suppressions of doubler and tripler modes vs. input levels. . 96
3.33 Output spectra of doubler (a) and tripler mode (b) showing the har-
monic suppressions at 5% locking range input levels. . . . . . . . . . . 97
3.34 Power and locking range trade-offs for doubler and tripler modes. . . 98
3.35 Output phase noises of doubler and tripler with comparison to free
running conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.36 Simulated transient of modulus change from tripler to doubler for the
dual-modulus ILFM, which shows a dynamics time in ns range. Limit-
ing amplifier is added at output to balance the amplitudes for the two
modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.1 Injection-locked clocking (ILC). . . . . . . . . . . . . . . . . . . . . . 101
4.2 Voltage gain of an inverter and an injection-locked oscillator at different
input signal levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3 Power savings of ILC relative to conventional clock distribution. . . . 104
4.4 Skew introduced by resonant frequency error vs. quality factor of res-
onator in a resonant based clock distribution. . . . . . . . . . . . . . 105
4.5 Illustration of ILC jitter suppression in comparison with conventional
clocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.6 Illustration of the three different configurations (a-c) of global clock dis-
tribution, and a possible floorplan (d) for the ILC-based global clock
distribution in Alpha 21264. Each configuration is designated accord-
ing to its clocking network: XGM, IGM, and IM′. . . . . . . . . . . . 110
4.7 Circuit-level jitter simulation setup. . . . . . . . . . . . . . . . . . . . 111
xix
4.8 Breakdown of processor power consumption with different clock distri-
bution methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.9 Schematic of (a) the test chip and (b) a divide-by-two ILO used. . . 116
4.10 Chip micrograph of the test chip. The whole chip size is 1.5mm ×1.3mm, and each ILO occupies 0.25mm×0.22mm. The H-tree sections
measure 500 µm, 280 µm, and 290 µm, respectively, from root to leaves.116
4.11 Spectrum of the generated local clock signal from ILO1, identical to
that from other ILOs on-chip. . . . . . . . . . . . . . . . . . . . . . . 117
4.12 Locking range of ILO1, identical to that of other ILOs on-chip. . . . 118
4.13 Phase noise of reference clock and 4 output clocks at different positions
on chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.14 Deskew capability of ILC. (a) deskewing when tuning ILO1 and/or
ILO2; (b) deskewing when tuning ILO1 and ILO2 differentially. The
skew is measured between the two output clock signals of ILO1 and
ILO2. Note that there is some imbalance between ILO1 and ILO2
caused by mismatch in the clock distribution tree and measurement
system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.15 Two possible circuit implementation of non-division injection-locked
oscillator. (a) requires a differential input while (b) can take both
single-ended and differential input. . . . . . . . . . . . . . . . . . . . 121
4.16 Non-division ILO analysis: (a) circuit model; (b) loop behavior model.
Similar to the divide-by-3 topology, differential injection is required for
non-division operation. . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.17 Phase shifting characteristics of non-division ILO at different injection
ratios when Q = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.18 Non-division ILO with binary weighted switch capacitor tuning. . . . 124
4.19 (a) Schematic and (b) chip micrograph of the test chip. . . . . . . . 125
xx
4.20 Measured spectrum for output clock. . . . . . . . . . . . . . . . . . . 126
4.21 Measured locking range of an ILO in ILC. . . . . . . . . . . . . . . . 127
4.22 5-bit digital phase shift tunings for two ILOs’ outputs. . . . . . . . . 127
4.23 Phase noise comparison of the input clock from signal source and out-
put clock from the ILO. . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.24 Jitter measurement of the signal source and ILC output at 4 GHz.
From extrapolation, the cycle-to-cycle jitter of signal source and ILC
output are 0.11 ps and 0.14 ps, respectively. . . . . . . . . . . . . . . 130
4.25 ILC with active deskewing. . . . . . . . . . . . . . . . . . . . . . . . . 130
4.26 An injection-locked clocking system with ILO-based active deskew.
Four ILOs are driven by the input clock through an H-tree. Each
ILO is buffered by Buf1 to drive 2 pF of on-chip load capacitor (CL),
which also converts the ILO differential output to a single-ended signal.
Output buffers Buf2 drive the test ports (TPx). . . . . . . . . . . . . 132
4.27 (a) Deskew logic algorithm, and (b) an example of the deskew sequence
which shows the design for ringing prevention. . . . . . . . . . . . . . 133
4.28 Test chip die photo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.29 Measured locking range for the ILC network. . . . . . . . . . . . . . 135
4.30 (a) Measured free-running frequency tuning, and (b) delay tuning un-
der locked state by 5-bit switched capacitor array. . . . . . . . . . . 136
4.31 Deskew dynamics of the deskew loop. . . . . . . . . . . . . . . . . . . 137
4.32 (a) Phase noise of the ILC output in comparison with input clock
and free-running ILO. (b) Cycle-to-cycle jitter test for ILC output and
input clock. The degradation is only 0.04 ps. . . . . . . . . . . . . . . 138
xxi
5.1 (a) Conventional dual-loop CDR, with loop I for frequency acquisi-
tion and loop II for phase acquisition. (b) Injection-locked CDR with
frequency acquisition achieved by injection locking. . . . . . . . . . . 140
5.2 Optoelectronic oscillator with optical fiber as the resonator. . . . . . 142
5.3 Injection-locked optoelectronic oscillator (OEO) with free space res-
onator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Chapter 1
Introduction
The development of information technology has greatly increased our capabilities
of producing, manipulating, storing, communicating, and disseminating information.
Communications and computers are the two major foundations of information tech-
nology. In recent years, many high speed communications, wired and wireless, have
been introduced to provide fast data transfers at different distances and through dif-
ferent transmission media. At the same time, advances in microelectronic processing
techniques and ever-increasing market demand have enabled continuous increases in
computer performance. For both communication systems and computers, there is
one important common signal, the clock signal. The speed of the clock signal is usu-
ally the most important indicator of system performance for communication systems
and computers. The development of communications and computer technologies has
pushed the clock frequency of both systems into multi-gigahertz range. At such a
high speed, new challenges and considerations are emerging for conventional clock
design methods.
1
2
1.1 Clocking in Communications and Computing
Systems
Communications systems are the foundation for information transfers. According
to different transmission media, communication systems are generally categorized
into wired and wireless communications. Wired communication is also called wireline
communication. It is now the dominant method for long distance telecommunications.
It provides the backbone for the information highway. At the same time, wireline
communication is also the main technology for fast local area networks (LANs), which
connect fixed servers and workstations inside office and residential buildings.
Currently, the dominant long distance wireline communication standard is syn-
chronous optical networking (SONET) and synchronous digital hierarchy (SDH) [1].
They are standardized multiplexing protocols that transfer multiple digital bit streams
over optical fiber using lasers or light-emitting diodes (LEDs). Designed for high
speed data transfer from the beginning, SONET/SDH has also evolved with time to
support higher data rate with newer generations. Since the standardization of these
two technologies in 1988, the transfer speed of SONET/SDH has increased from the
original optical carrier 1 (OC-1) of 51.84Mb/s to the currently wide implemented op-
tical carrier 192 (OC-192) and optical carrier 768 (OC-768) at speeds of 10Gb/s and
40Gb/s, as shown in Fig. 1.1. Higher speed version of optical carrier 3072 (OC-3072)
supporting a speed of 160Gb/s is also planned for the near future.
For local area networks (LANs), Ethernet is the most widely used wireline com-
munication standard [2]. Its speed has also increased with time (Fig. 1.1). The first
Ethernet introduced by Xerox PARC in 1973-1975 ran at only 3Mb/s. The supported
data rate of 10base-X Ethernet started around 1980 increased to 10Mb/s. Fast Ether-
net introduced in 1995 achieves a data rate of 100Mb/s. Right now, Gigabit Ethernet
(GbE), supporting a data rate of 1Gb/s, has become the de facto protocol for data
3
Figure 1.1: Speed of wireline communication standards increases with time. Forlong distance wireline communication standards, synchronous optical networking(SONET) and synchronous digital hierarchy (SDH), the speed has increased from51.84Mb/s in 1988 to the current large scale deployment of 10Gb/s and 40Gb/s sys-tems. Speed for short distance wireline communication standard Ethernet increasedfrom only 3Mbps in 1975 to current speed of 10Gb/s.
transmission for LANs. Most personal computers (PCs) now ship with a GbE net-
working port as a standard. At the same time, faster versions of Ethernet are also
commercially available. In 2007, more than 1 million 10-Gb/s Ethernet (10-GbE)
ports were shipped. It can support data rate of 10Gb/s and is becoming widely
used in the fiber-optic core of telecommunication networks and is even available as
backplanes for ultrafast commercial equipment enclosures. For the future, even faster
40- and 100-GbE Ethernet standards are under standardization by the IEEE 802.3
Higher Speed Study Group (HSSG) under the IEEE 802.3ba designation.
The speed for wireline communications has been increasing rapidly in recent years.
Wireless communications, at the same time, are also moving from crowded sub and
4
Figure 1.2: Commercial wireless systems are expanding from sub- and low-gigahertzrange to multi-gigahertz, even millimeter-wave range in order to support higher datarates.
low-gigahertz range to multi-gigahertz, or even milimeter-wave range for higher data
rates. Cellular phone systems used to be the only major commercial wireless appli-
cation, whose frequencies are in the crowded sub-2GHz range, as shown in Fig. 1.2.
With the introductions of wireless local area network (WLAN) and wireless personal
area network (WPAN), wireless communications quickly occupied the multi-gigahertz
frequency range [3, 4]. Recent developments in wireless digital network interface for
the streaming of high definition video signal over air have further pushed commercial
wireless communications to millimeter-wave, or tens-of-gigahertz range [5].
Clock signal is critical to both wireline and wireless communication systems. In
wireline communications, the receiver relies on the clock signal to determine the time
to sample the data it receives from the transmitter [6], as demonstrated in Fig. 1.3.
A good sampling clock in a wireline receiver shall have the sampling transition edge
5
Figure 1.3: The role of clock signal in a typical wireline receiver. The highlightedportion in (a) is the clocking circuit. The clock signal is first frequency synchronizedwith that of the incoming data stream by a reference clock fref , which is sharedby both transmitter and receiver. The frequency synchronized clock is then phasesynchronized with the data stream, and used to sample the data. (b) The exampleof good and bad sampling clocks. The good sampling clock, clock A has a samplingtransition edge at the center of the data stream eye diagram. The bad sampling clock,clock B has a sampling edge at the edge of the data stream eye diagram. A samplingedge away from the eye diagram center increases the chance of error of the system.
right in the center of the data stream eye diagram. If this clock condition is not
satisfied, the error rate of the communication system will increase. In order to control
the error rate, both the frequency and phase stability of the clock signal in wireline
communication systems should be tightly controlled.
In wireless communications, the clock signal, or RF carrier determines the RF
transmission frequency. Due to interference concerns, accurate RF carrier frequencies
are required for different wireless communications terminals to work concurrently
within the same location [7]. Fig. 1.4 (a) shows the role of RF clock signal in a
typical radio transceiver. It is a bridge between low frequency and the radio frequency
signals. If the accuracy of this clock signal is not satisfied, the up-converted signal
may overlap with adjacent channels or other adjacent interferers and degrade the
performance of the wireless communications system, as illustrated in Fig. 1.4 (b).
6
Figure 1.4: The role of clock signal in wireless communications systems. (a) showsa typical radio transceiver for wireless communications. The highlighted portionis the clock generator for wireless systems, the frequency synthesizer. The outputof frequency synthesizer up converts the transmitted signal from low frequency toradio frequency, and down converts the received signal from radio frequency to lowfrequency. (b) If there is any frequency error for the synthesizer output, the up-converted signal may overlap with adjacent channels in radio frequency and causeerrors for both the transmission channel and adjacent channels.
High performance computing systems are the engine for information technology.
Historically, the performance advances of computing systems were enabled by two
coexisting forces. One is the advances in microelectronics fabrication technology,
and the other is the fast-increasing market demands for computing power. Until the
beginning of the 21st century, faster clock speed, larger logic integration scale and
smaller feature size of CMOS process have always been the natural ways to increase
the processor performance. Intel’s 4004 processor shipped in 1971 only worked at
a maximum speed of 740 KHz. Clock speed for it’s 80286 shipped in early 1980s
increased to 8 MHz. In 1993, Intel introduced the fifth generation of x86 micro-
architecture processor, Pentium, working at a speed of 66 MHz. Pentium 4 introduced
in 2000 worked at a speed exceeding 1 GHz, reaching 1.3 GHz. Even though the heat
dissipation bottleneck has slowed down the trend for processor clock speed increase,
to 2007, the clock speed for Intel’s processor still reached 3.8GHz. Considering the
7
influence of the power dissipation plateau, the ITRS 2007 predicted the processor
clock speed will keep increase at a speed of 1.25 times per technology node, reaching
near 9 GHz in 2015 [8]. At the same time, off-chip clock for processor will increase at
a faster speed and reach 30 GHz in 2015. The IO bandwidth of the processor and the
number of cores will also increase at the same time, as shown in Fig. 1.5 [8]. Based
on the data from processors released between 2007 and 2010, even though the heat
dissipation challenge and multi-core technique have slowed down the processor main
clock speed increase predicted by ITRS 2007, the off-chip clock speed for processors
is increasing steadily along the trend.
The commercial developments of new computer interconnect technologies are also
proving the predictions by ITRS. For the communication between the processor and
the peripheral component, PCI Express with clock frequency up to 4GHz are quickly
replacing the conventional PCI bus, as shown in Fig. 1.5 [9]. Serial ATA (SATA) with
clock speed up to 6GHz are replacing the ATA for computer hard drive data path
[10]. Memory interface technologies from Rambus and other companies are targeting
for 1TB/s speed at clock frequency of more than 10 GHz [11].
In computing systems, clocking is one of the most critical functions because it
is involved in almost all aspects of data processing and communications. For syn-
chronous digital logic, which is the most popular logic form for data processing, the
clock defines the start and end time for the movement of logic signals between differ-
ent registers in the data path [12, 13]. One clock cycle is the given time for all the
logic circuitry to complete their actions and get ready for the next step.
The typical structure of high speed clocking in a VLSI system is shown in Fig. 1.6
[14]. Usually an off chip crystal oscillator works as the reference for an on chip
clock generation phase-locked loop (PLL) [6]. The clock generation PLL generates
a gigahertz frequency tunable clock signal for the entire VLSI chip. After the clock
generator, a clock distribution network delivers the clock signal to every logic gate
8
Figure 1.5: On-chip and off-chip clock frequencies, IO bandwidth and number of coresfor the processors in the near future predicted by International Technology Roadmapfor Semiconductors (ITRS) 2007. According to the prediction, the on-chip clockfrequency will reach near 9 GHz in 2015. The off-chip clock frequency will reach 30GHz and IO bandwidth will reach 80 Tb/s. Number of cores in 2015 will be around10.
9
Figure 1.6: Clock generation and distribution in a processor chip. The clock gen-eration PLL generates gigahertz, frequency tunable clock signal for the entire VLSIchip. After the clock generator, a clock distribution network delivers the clock signalto every logic gates across the chip.
across the chip. Controlling of the space and timing variation on clock transition edge
is the design target for clock distribution.
1.2 Challenges for High Performance Clocking
The challenges for high speed clocking come from the high susceptibility to clock
timing errors. The same amount of absolute timing error corresponds to a larger per-
centage of cycle time for a higher speed clock. Less usable cycle time translates to less
immunity to noise and errors in the data processing and communication processes. In
order to maintain acceptable error rates, absolute timing error therefore needs to be
scaled down with the clock cycle time. For wireless communication systems, higher
speed carrier frequency is harder to generate while maintaining good frequency sta-
bility [15]. Inaccurate carrier frequency broadens the signal bandwidth and generates
interference between different wireless systems, and different channels within a same
10
standard [7].
In different systems, there are different ways of describing the clock timing errors.
For computing and wireline communication systems, the clock timing errors are usu-
ally characterized by skew and jitter [12]. Both skew and jitter are time domain are
descriptions of the clock timing error, with the former describe the space variations
of clock transition edges, and the latter describe the time variations of the clock tran-
sition edge. For wireless communication systems, the clock timing errors are usually
characterized by the phase noise of the local oscillation signal [15, 7]. It is a frequency
domain description on the phase variation of the local oscillation, or the clock signal.
1.2.1 Skew
Skew is defined as the clock arrival time differences at different positions [14].
For synchronous logic, different positions means different flip-flops. Clock skew in
synchronous logic is generally caused by mismatches in clock distribution network.
These mismatches can be from lengths and electrical properties of clock interconnect,
the sizes and device parameters of clock buffers, and from clock load itself.
The effect of clock skew on the performance of synchronous logic systems can be
demonstrated from the following two examples. Synchronous logic is usually com-
posed by the structure of two sequential registers with combinational logic inserted
in between. Suppose we have such a structure as shown in Fig. 1.7 (a), where Ci
and Cj are the clock signals for the near-end register Ri and the far-end register Rj.
When clock Ci arrive at the near-end register, it starts the movement of signal from
Ri to the combinational logic after a delay Tcq. Signals have different paths in com-
binational logic before they reach the far-end register. We assume the longest path
has a delay of Tijmax. Upon arriving at Rj, a setup time, Tsetup, for the register is
required. Usually the three delay elements, Tcq + Tijmax + Tsetup, in such a data path
comprise the minimum allowable clock cycle time Tcycle. However, if there is a clock
11
Figure 1.7: Effect of skew on (a) long path error and (b) short path error.
skew such that Cj arrives earlier than Ci, we will have a problem if we still run the
logic at the same clock frequency. Because of the existence of an early skew, Tskew,
at Rj, clock Cj arrives before the signal at Rj input becomes stable according to the
setup time. This may cause a function error to the logic and is usually referred as
a setup violation. Because this type of error is usually caused by the longest delay
path in the logic, it is also called a long path error. One ready remedy for long path
error is to increase the clock cycle time, so that
Tcycle ≥ Tcq + Tijmax + Tsetup + Tskew (1.1)
can be satisfied. Apparently the existence of skew increases the minimum allowable
clock cycle time, and thus limits the maximum speed of the logic.
Another effect of skew on synchronous logic is shown in Fig. 1.7 (b), where the
clock, Cj, at far-end register, Rj, arrives later than the near-end one, Ci, with a skew
of Tskew. This time the signal goes through the shortest path in combinational logic
and has a smallest delay, Tijmin. The signal arrives the input of Rj after a total delay
of Tcq + Tijmin. This delay is smaller than the sum of clock skew Tskew and the hold
12
Figure 1.8: Illustration of clock jitter as the clock period variations. The period ofclock with jitter is a Gaussian distribution with the mean value equal to the idealclock period Tavg.
time Thold required by the register Rj for the data of the previous cycle. The early
arrival of data of this cycle may damage the data of the previous one, causing a logic
error. This error is usually referred as a hold violation, or short path error, because
it is usually caused by the shortest path in the combination logic. A hold violation
can not be corrected by increasing the clock cycle time. Instead, delay elements need
to be added in the shortest logic path. However, this method must be done carefully
so that the longest path delay is not increased.
1.2.2 Jitter
Jitter is defined in ITU (International Telecommunication Union) specifications as
”short term variations of a digital signals significant instance from their ideal position
in time.” If we observe a signal with jitter on oscilloscope, the jitter shows up as a
blurred rising and falling edges (Fig. 1.8). Jitter adds up to the clock uncertainty on
top of skew. It further reduces the timing margin in a clock cycle and hence reduces
the maximum speed of a system.
13
A practical clock signal can be written as
Vclock(t) = A(t)f [ω0t + φ(t)] (1.2)
where the function f is periodic in 2π and A(t) and φ(t) model fluctuations in am-
plitude and phase. The amplitude fluctuations can be significantly attenuated by
the amplitude limiting mechanism, which is readily present in clock generation and
distribution systems. The phase fluctuation φ(t), becomes the main cause of clocking
time error. With the existence of φ(t), the clock period will be variable from cycle
to cycle, and usually in a Gaussian distribution, as shown in Fig. 1.8. We refer to
Tn as the period of cycle n, and the mean value of the Tn distribution is defined as
the average period of the clock, which is denoted as Tavg. The standard deviation of
this clock period distribution is defined as cycle-to-cycle jitter σc. The equation for
σc can be expressed as
σc =
√
√
√
√ limN→∞
(1
N
N∑
n=1
(Tn − Tavg)2). (1.3)
The cycle-to-cycle jitter describes the magnitude of the clock period fluctuations
in short-term, but contains no information on the long-term jitter accumulation. The
long-term jitter accumulation can be characterized by absolute jitter, or long-term,
jitter, σabs(t). It is defined as the sum of periods variations during an observation
time t.
σabs(t = NTavg) =
N∑
n=1
(Tn − Tavg) (1.4)
The long time jitter definition is a function of observation time relative to the trig-
gering event. For free-running oscillators, it will increase to infinity with observation
time, since the period variation of one cycle has no information about the variation
14
of previous cycles. This measure is more often used with PLLs, because a PLL has a
phase reference source that reset the jitter, which a free-running oscillator does not
have. From the operation analysis of the synchronous digital logic system, the jitter
that matters in its clock signal is the period variation between two sequential cycles,
which is cycle-to-cycle jitter σc.
For baseband digital communication, the jitter definition has broader meanings,
where the random part of clock time variations is defined as random jitter, and the
deterministic part is defined as deterministic jitter. Deterministic jitter is mainly
caused by duty cycle distortion and data dependent jitter. Deterministic jitter is
not random, so it is usually described by peak-to-peak magnitude. Useful jitter
characteristics for baseband digital communication receivers include jitter generation,
jitter transfer and jitter tolerance. Jitter generation is the process whereby jitter
appears at the output in the absence of applied input jitter. It is the measure of
the receiver’s ability of producing jitter itself. Jitter transfer is characterized by
the relationship between the applied input jitter and the resulting output jitter as a
function of frequency. For receivers in which a linear process describes the transfer of
jitter from the input to output port, the jitter transfer function is the ratio of output
jitter spectrum to the applied input jitter spectrum. Jitter tolerance is defined as the
peak-to-peak amplitude of sinusoidal jitter applied on the input signal that causes a
1-dB power penalty. This is a stress test intended to ensure that the receiver works
properly under required jitter conditions.
1.2.3 Phase Noise
For wireless communication systems, it is more convenient to study the jitter
in frequency domain, which is called phase noise [15, 7]. Phase noise is usually
characterized in terms of the single sideband noise spectral density. It has units of
15
decibels below the carrier per hertz (dBc/Hz) and is defined as
L(∆ω) = 10logPsideband(ω0 + ∆ω, 1Hz)
Pcarrier
(1.5)
where Psideband(ω0 + ∆ω, 1Hz) represents the single sideband power at a frequency
offset of ∆ω from the carrier with a measurement bandwidth of 1 Hz. A typical phase
noise profile for an oscillator is shown in Fig. 1.9 (a). The phase noise usually drops
with the increase of offset frequency from the carrier. According to the slope of this
drop, the phase noise profile can be divided to three regions. The closest region has
a slope of 30dB/dec, the middle region has a slope of 20dB/dec, and at large offset
frequency, the profile becomes flat.
The phase noise is especially important for the LO signal in wireless systems. This
is because when an LO down converts the desired signal, adjacent interferers also get
convolved with the LO signal and down-converted near the desired signal. The phase
noise sideband of the LO will be transferred to both the down-converted desired signal
and interferers. This may result in overlapping spectra between the desired signal and
interferers, with the tails of interferers corrupting the desired information, as shown
in Fig. 1.9 (b). This effect is called reciprocal mixing.
There are many sources in an oscillator that contribute to the generation of phase
noise. The conversion mechanisms from different sources to phase noise are usually
different and complicated. In order to gain an intuitive understanding of these pro-
cesses, we will assume a linear time-invariant model for a typical LC oscillator, and
study the conversion process of device thermal noise in the LC tank to the oscillator
output phase noise (Fig. 1.10) [16]. The thermal noise of a device in the tank can
be represented by a white noise current source across the tank with a mean-square
spectral density ofi2n∆f
=4kT
R. (1.6)
16
Figure 1.9: (a) Typical phase noise profile of an oscillator versus offset from carrier.The corner frequencies ∆ω 1
f3
and ∆ω 1
f2
represent the boundaries between 30dB/dec,
20dB/dec and flat regions of the phase noise profile. (b) Illustration of the process ofreciprocal mixing.
This noise current will only see the impedance of a perfectly lossless LC network
because the loss of the tank is canceled by the active energy restoration element. For
a relatively small offset frequency ∆ω from the center frequency ω0, the impedance
of a lossless LC network may be approximated by
Z(ω0 + ∆ω) ≈ −jω0L
2(∆ω/ω0). (1.7)
Substitute ω0L with the tank quality factor Q and effective parallel resistance R by
ω0L =R
Q, (1.8)
we have the tank impedance rewritten as
Z(ω0 + ∆ω) ≈ −jRω0
2Q∆ω. (1.9)
17
Figure 1.10: (a) A typical LC oscillator, and (b) the model for phase noise analysis.The active circuit part provides a negative resistance to compensate the loss of theLC tank.
Multiply the squared magnitude of this tank impedance with the spectral density of
the mean-square noise current, we obtain the spectral density of the mean-squared
noise voltage:v2
n
∆f=
i2n∆f
|Z|2 = 4kTR(ω0
2Q∆ω)2. (1.10)
The power spectral density of the output noise is frequency-dependent, and is
proportional to the inverse-square of the offset frequency. This is because of the
filtering function of the LC tank. With the consideration of flicker noises at small
offset frequency, we can get a 1∆ω3 region on top of Eqn. 1.10, which is because the
flicker noise has a power spectral density inversely proportional to frequency. As
mentioned earlier, the above approach has assumed a linear time-invariant model
for the oscillator, which is not accurate enough for practical oscillators. With the
consideration of the nonlinear and time-variant natures of practical oscillators, [16]
gave a complete description of phase noise generation process, which involves the
conversion from device noise to the phase fluctuation φ(t) as in Eqn. 1.2, and a phase
modulation from φ(t) to oscillator output phase noise.
While jitter can be viewed as a statistical behavior for the random process of
18
Figure 1.11: (a) The linearized noise model for a typical type II PLL, and (b) atypical PLL phase noise profile.
phase fluctuation φ(t), the phase noise is the frequency domain representation for the
same process. These two can be related by [17]
σ2abs = 4
∫ +∞
−∞Sφ(f) sin2 (πft) df. (1.11)
Note that the jitter in this relationship is absolute jitter, or long-term jitter.
It is worth to mention here the effect of frequency division and multiplication on
phase noise. Since frequency and phase are related by a linear operation, a division or
multiplication to frequency is identical to division and multiplication of phase by the
same factor. For the clock signal in Eqn. 1.2, a frequency division by M will generate
Vclock/M(t) = A(t)f [ω0
Mt +
φ(t)
M]. (1.12)
19
The magnitude of the phase fluctuation is also divided by M , and from the narrow
band phase modulation approximation, the phase noise power is divided by M 2. If we
assume the carrier amplitude to be the same, a frequency division by M will generate
a phase noise reduction of 20 log M dB. On the other hand, a frequency multiplication
by M will increase the phase noise by 20 log M dB.
Phase noise of a phase-locked loop (PLL) is also of great interest for the study of
high speed clocking. For a typical type II PLL, the linearized noise model is shown
in Fig. 1.11(a). This noise model indicates that, the PLL output phase noise Φout is
contributed by both the input phase noise Φin and the phase noise generated by the
VCO ΦV CO. The noise transfer functions of the loop from both input and the VCO
are shown below:Φout(s)
Φin(s)=
ω2n + 2ζωns
s2 + 2ζωns + ω2n
, (1.13)
Φout(s)
ΦV CO(s)=
s2
s2 + 2ζωns + ω2n
, (1.14)
where ωn is the natural frequency of the loop and ζ is the damping factor. From the
noise transfer functions, we can see noise from input is low pass filtered by the loop,
while noise generated by the VCO is high pass filtered by the loop. The typical phase
noise profile for such a PLL is shown in Fig. 1.11(b), where the output phase noise
Lout(∆ω) is the superposition of the low pass filtered portion of input noise Lin(∆ω),
and high pass filtered portion of VCO noise LV CO(∆ω).
Skew, jitter and phase noise characterized the major timing uncertainties for high-
speed clocking systems. Many techniques in both circuit and system levels have been
developed to control these timing uncertainties. At the same time, power consumption
becomes a more and more important consideration when dealing with such problems
as many applications are moving to battery powered portable devices. The following
part of this proposal will introduce some new techniques, based on injection-locked
oscillators, to handle the problems of skew, jitter and phase noise for high speed
20
clocking applications with good power efficiency.
1.2.4 Power Consumption
Energy efficiency is a big design consideration for battery-powered portable com-
munication devices because it directly determines the battery life of such devices.
Inside the transceiver of these portable communication devices, frequency synthesizer
consumes a considerable portion of the total radio power. As the carrier frequency
moves to higher frequency, the power consumption of the frequency synthesis circuit
will go to even higher level. Reducing the power consumption is thus a priority besides
the performance in designing high frequency synthesizers.
Inside a frequency synthesizer, the frequency divider of the phase-locked loop
(PLL) is one of the power hungry devices. For conventional digital implementation
of frequency dividers, the power consumption will increase linearly with frequency.
This increase of power becomes unacceptable for systems working in millimeter-wave
range. New frequency divider with better power consumption in multi-gigahertz and
millimeter-wave range is highly desired for wireless systems working in these frequency
ranges.
Power dissipation in computing systems is a major limitation at many levels. This
limitation is especially prominent for high performance processor chips. The Inter-
national Technology Roadmap for Semiconductors (ITRS) states that the amount of
heat that can be removed from a chip in a cost-effective way is about to reach a
plateau, saturating at about 200 Watt.
There are many sources of power consumption in high performance processors.
Clock distribution circuit is one of the major contributors of the total power consump-
tion. In large scale processors, the total power dissipation in the clock distribution
can be significant. It may be as much as 30-40% of the total power consumption of
the whole system. The most dominant component of clock power consumption is due
21
to the dynamic switching:
P = CV 2DDfclk, (1.15)
where C is the loading capacitance of the whole clock network, VDD is the supply
voltage and fclk is the clock frequency. The capacitance of clock load can be very
high in large scale digital systems, possibly in the nano-Farad range. The capacitance
of the clock arises from many sources. First, the interconnect capacitance of the
metal lines is a major source of capacitance. Second, there are large buffers used
in the clock distribution network that give rise to large fanout and self-capacitance
terms. Third, there are capacitances associated with the inputs of the flip-flops
driven by the clock. When designing the clock distribution network, it is important
to minimize the capacitance of the clocking network in order to reduce the switching
power consumption.
1.3 Dissertation Organization
The following parts of the dissertation are organized as follows. Chapter 2 will
introduce the background knowledge of conventional clock generation and distribu-
tion methods. Some recent proposals for solving some of the high-speed clock design
challenges are also discussed. Chapter 3 will give a detailed background introduc-
tion about the injection locking phenomenon, and also three circuit innovations of
injection-locked frequency converters for communication systems. These three cir-
cuits include a tunable dual-phase signal generator based on double balanced divide-
by-2 injection-locked frequency divider, a divide-by-3 injection-locked frequency di-
vider based on differential injection and harmonic engineering, and a multiply-by-2/3
dual-mode injection-locked frequency multiplier which is capable of good harmonic
suppression in digital CMOS processes. After that, chapter 4 will present our new
injection-locked oscillator design for clock distribution purpose, and injection-locked
22
clocking, a whole new set of clock distribution method for multi-gigahertz clock dis-
tributions in multi-core era. Chapter 5 will conclude this dissertation and point to
some future research directions.
Chapter 2
High Performance Clocking
2.1 Clock Generation
Clock generation, or frequency synthesis in wireless communications term, is the
process of generating one or many clock frequencies from one or a few reference
sources. Clock generator, or frequency synthesizer is an important building block in
both processors and communications systems.
In early days’ wireless systems, a frequency synthesizer was a crystal-controlled
oscillator with a bank of crystals switched in manually. The frequency accuracy and
stability of such kind of clock generators were mainly determined by the accuracy
and stability of the crystal.
Direct frequency synthesis, or incoherent synthesis is the second generation of fre-
quency synthesis for wireless systems. This approach utilized relatively few crystals
to generate many frequencies by means of frequency mixing, division and multiplica-
tion. Fig. 2.1 is an example of direct frequency generation by frequency mixing and
division, where frequency at 23f0 is generated from a f0 source.
23
24
Figure 2.1: Direct frequency generation using the mix-and-divide principle. It requiresexcessive filtering.
Clock generation by direct frequency synthesis can produce fast frequency switch-
ing, almost arbitrarily fine frequency resolution, low phase noise, and the highest-
frequency operation of any of the methods. However, direct frequency synthesis
requires considerably more hardware(oscillators, mixers, and bandpass filters). The
hardware requirements result in direct synthesizers being larger and more expensive.
Another disadvantage of the direct synthesis technique is that unwanted frequen-
cies can appear at the output. The wider the frequency range, the more likely that
spurious components will appear in the output.
In modern communications systems, a frequency synthesizer generates an output
frequency given by fout = f0 + kfch, where f0 is the lower end of the range, k is an
integer varying from 0 to the maximum number of channels and fch is the frequency
step, or channel spacing. In the receive band of IS-54 cellular systems, for example,
f0 = 869MHz, k = 0, ...833, and fch = 30kHz. The very high accuracy required in
the specifications of f0 and fch ofen requires the use of phase-locked loop (PLL).
In early processors where the clock speed is low, a simple form of clock generation
is just a ring oscillator. A ring oscillator has an odd number of inverter stages and
produces an oscillating signal at each node. The period of oscillation is determined by
the delay of each stage and the number of stages. This type of clock generation circuit
is only suitable for low speed clock generations, because the generated clock signal
is quite process dependent and unstable. In order to achieve better clock frequency
25
Figure 2.2: Typical integrated oscillators. (a) a ring oscillator and (b) an LC oscilla-tor.
stability, again, on-chip phase-locked loops (PLLs) with crystal oscillators reference
are used for processor clock generation.
2.1.1 Oscillators
An oscillator is one of the most basic and essential component in a clocking sys-
tem. It converts dc power to RF power and produces a steady-state sinusoidal signal.
This steady-state periodic signal is usually frequency divided, multiplied or mixed
with another periodic signal to generate the clock signal, or RF carrier for a com-
munication or computer system. The frequency stability, or spectral purity of the
generated periodic signal is usually the most important performance figure for an os-
cillator. Other factors like power consumption and cost are also considered in many
applications in choosing an oscillator topology. These factors usually demonstrate
trade-offs with the frequency stability.
According to frequency determination mechanisms, oscillators can be categorized
into ring oscillators and resonator-based oscillators. A ring oscillator (Fig. 2.2-a) has
its oscillation frequency determined by the delay of a positive feedback loop. Usu-
ally several identical delay elements form a ring structure to construct an oscillator,
26
which gives the name of this type of oscillator. At the frequency of oscillation, the
ring structure has a steady-state closed loop gain of one and loop phase shift of 2π,
which forms a positive feedback and enables the oscillation signal to add in phase
as it propagate along the ring structure. Ring oscillators usually have smaller area
and larger tuning range compared with resonator based oscillators. However, they
usually have inferior frequency stability and consume larger power. In applications
where performance is not strictly required, while at the same time, cost is a major
consideration, ring oscillators are a very good candidate for the generation the clock
source.
A resonator-based oscillator has its oscillation frequency determined by the res-
onator. For integrated circuits, such resonators are usually built with LC (inductor
and capacitor) tanks. So they are also called LC oscillators (Fig. 2.2-b). An LC
differential oscillator can be analyzed by Barkhausen’s criteria in a feedback loop as
shown in Fig. 2.3. In Fig. 2.3-a, an LC differential oscillator is discomposed into
two parts, the LC tank H(jω) and the active circuit f(V0). This two parts form a
feedback system as demonstrated in Fig. 2.3-b. According to Barkhausen’s oscillation
criteria, at steady state oscillation, the following relationships should hold:
|f(V0)H(jω0)| = 1 (2.1)
6 [f(V0)H(jω0)] = 2π (2.2)
where ω0 is the frequency of oscillation. Because the active circuit in Fig. 2.3-a
introduces zero phase shift, the oscillation frequency of this oscillator can only be at
the resonant frequency of the LC tank. This is because, as shown in Fig. 2.3-c, the
phase shift of the LC tank is zero only at its resonant frequency, which equals to 1√LC
.
In such LC oscillators, active circuits only provide the negative resistance nec-
essary to cancel the loss of the LC tank. Because the energy in LC oscillators is
27
Figure 2.3: Barkhausen oscillation criteria for LC oscillators. (a) an LC differentialoscillator composed by an LC tank, represented by H(jω), and active circuit, rep-resented by f(Vo); (b) feedback loop model for the LC oscillator; (c) amplitude andphase response of the LC tank H(jω).
28
recycled between the inductance and capacitance of the resonant tank, LC oscillators
usually demonstrate smaller power consumption than ring oscillators. Also because
of the bandpass filter function of the LC tank, LC oscillators also can have much bet-
ter spectral purity than ring oscillators. The only trade-off for these benefits of LC
oscillators is its larger occupied area, which comes from the passive inductor of the
LC tank. LC oscillators find their applications in high performance systems where a
high frequency stability clock source is required.
2.1.2 Phase-Locked Loop (PLL)
A phase-locked loop (PLL) is a feedback system that operates on the excess phase
of nominally periodic signals. Shown in Fig. 2.4 is a sample charge pump PLL,
consisting of a phase and frequency detector (PFD), a charge pump (CP), a low pass
filter (LPF) and a voltage-controlled oscillator (VCO). The phase difference between
the input x(t) and the output y(t) is sensed by the PFD and converted to voltage
signal by the charge pump. The low pass filter removes the high frequency component
in the charge pump output, feed the dc component to the control port of the VCO,
change the oscillation frequency of VCO and correct the frequency and phase error
between input x(t) and output y(t).
In the locked condition, all the signals in the loop have reached a steady state and
the PLL operates as follows. The phase detector and charge pump produces an output
whose dc value is proportional to the phase difference between the reference clock and
the feedback clock from the voltage-controlled oscillator (VCO). The low pass filter
(LPF) suppresses high-frequency components in the PD/CP output, allowing the dc
value to control the VCO frequency. The VCO then oscillates at a frequency equal
to the input frequency.
From the basic operation of a PLL, we found that in locked condition, the input
and output frequencies are exactly equal, regardless of the magnitude of the loop gain.
29
Figure 2.4: A basic phase-locked loop, where PFD is the phase and frequency detector,CP is charge pump, LPF is low pass filter and VCO is voltage controlled oscillator.
This frequency synchronization function an extremely important property because
frequency synthesizers are intolerant of even small differences between the input and
output frequencies.
Further quantitative analysis on the loop transfer function of the PLL also reveals
that for a charge pump PLL in locked state, the phase error between the input and
output clocks is also eliminated if there is no mismatch for the PFD and charge
pump circuits. This phase synchronization function of charge pump PLL finds many
appliactions, include the wireline communications receiver and deskew circuits for
clock distribution.
If we insert a frequency divider in the feedback path of the VCO clock output,
because of the frequency synchronization nature, the divided clock will be equal
to the input reference, this means a PLL can be used for frequency multiplication
purpose. This frequency multiplication function is the basics for PLL based frequency
synthesis. PLL-based frequency synthesis is a coherent frequency synthesis method,
because it generates one or several clock frequencies from a single reference source.
This single reference source is usually an external crystal oscillator. A typical PLL-
based integer-N frequency synthesizer is shown in Fig. 2.5, where a voltage-controlled
oscillator (VCO) is corrected periodically by the phase comparison results between
the crystal reference and the frequency divided output [15]. The frequency divider is
built with variable division ratio, controlled by the modulus selection signal. Such a
30
Figure 2.5: A typical PLL-based Integer-N frequency synthesizer. With the divide-by-N frequency divider in the loop, the output clock frequency fout equals to N ×fin.The modulus selection is capable of changing the value for the division ratio N . Thusthe output frequency can be changed with step size of the reference frequency fin.
topology produces fout = Nfin, where N varies in unity steps from ML to MH . If
Mfin is to be equal to f0 +kfch, then for the first channel (k=0), we have MLfin = f0.
Furthermore, for the second channel, (ML +1)fin = f0 + fch, implying that fch = fin.
Thus, fout = MLfin + kfin. The simplicity of integer-N architecture has made it a
popular choice wireless frequency synthesis and processor clock generation for many
years. However, because the input reference frequency must be equal to the channel
spacing in integer-N frequency synthesizer architecture, it suffers several drawbacks.
These drawbacks include the reference spurs and the loop bandwidth limitations. In
order to overcome these drawbacks, in recent years, fractional-N architecture was
introduced. Fractional-N PLLs allow the channel spacing to be much smaller than
the input reference frequency, thus removed bandwidth limitation of integer-N PLLs.
Phase-locked loop (PLL) is widely used in clocking circuits because of its versatile
functions, as discussed above. However, the circuit realization of PLLs is not as sim-
ple as it seems from diagrams. Every building block in the PLL involves many design
considerations and challenges. The loop as a whole also requires careful designs. For
phase and frequency detector (PFD), the problem of dead zone, mismatch between
UP and DOWN path are the major design concerns. For charge pump, the charge
31
sharing problem and the mismatch between UP and DOWN current sources require a
lot of design efforts to remove. Design issues of voltage-controlled oscillators (VCOs)
include phase noise, power consumption, and frequency tuning range. Frequency di-
viders in the PLL feedback path usually consumes a large portion of the total power
budget. Large division ratios are usually realized by several dividers cascading to-
gether. The first frequency divider is usually called prescalar and draws most of the
attentions. This is because the prescalar needs to handle the highest speed among all
the frequency dividers, and at the same time, tends to consume most of the power.
The loop design includes the design of several important loop parameters, including
the loop bandwidth, the dampling factor, the phase margin and the lock time. Be-
cause of all these design challenges, the implementation of a PLL is usually time and
power consuming. For some critical applications, the use of a PLL is justifiable. For
some other applications, simpler circuit than a PLL is desired in order to reduce the
design efforts and power consumption.
2.1.3 Injection-Locked Oscillators (ILOs)
Injection locking [18, 19] is a special type of forced oscillation in nonlinear dynamic
systems (also known as synchronization). Suppose a signal of frequency ωi is injected
into an oscillator (Fig. 2.6-a), which has a self-oscillation (free-running) frequency
ω0. When ωi is quite different from ω0, “beats” of the two frequencies are observed.
As ωi approaches ω0, the beat frequency (|ωi − ω0|) decreases. When ωi enters some
neighborhood very close to ω0, the beats suddenly disappear, and the oscillator starts
to oscillate at ωi. The frequency range in which injection locking happens is called
the locking range (Fig. 2.6-b). The locking range determines the operation bandwidth
of an ILO, and need to be maximized. Generally speaking, it is proportional to the
injection strength. Injection locking also happens when ωi is close to the harmonic
or subharmonic of ω0, i.e. nω0 or 1nω0. The former case can be used for frequency
32
iw
iw
0w
iw
iw
(a)
0w
iw
ww -i
Locking
Range
1w
2w
(b)
Figure 2.6: (a) Beat and injection locking phenomenon when an oscillator is drivenby a single-frequency input signal. (b) locking range.
division, and the latter for frequency multiplication.
An injection-locked oscillator (ILO) can be considered as a simple first-order PLL
(Fig. 2.7-a), in which nonlinearity of the oscillator core functions as a phase detector.
For example, in a typical divide-by-two ILO (Fig. 2.7-b) [20], the oscillator core
(consisting of M1, M2 and Mtail) also serves as a single-balanced mixer for phase
detection. Because of the simple structure, ILOs consume much less power than a
full-fledged PLL, and can operate at frequencies as high as tens of gigahertz [21].
The fact that the built-in “phase detectors” are mixer-based explains why ILOs can
operate at the harmonic and subharmonic frequencies of the input signal. Harmonic
and subharmonic frequency injection locking make ILOs ideal for frequency division
and multiplication purposes.
Because it’s compact structure and easy implementation, divide-by-two ILFD
shown in Fig. 2.7-b has been the most reported ILFD [22, 23, 24]. [22] first ana-
lyzed the phase limited and amplitude limited locking range, and the noise transfer
characteristics of such a differential divide-by-two ILFD. The circuit implementa-
tion achieved 12.3% locking range at 3 GHz with power consumption of 1.2 mW.
33
(a) (b)
Figure 2.7: (a) A generic model of an injection-locked oscillator (ILO). (b) a divide-by-two ILO based on a common differential LC oscillator. The input signal is injectedinto the oscillator core through the tail transistor Mtail. This topology exhibits goodinjection locking efficiency because of the built-in single-balanced mixer structure.
A frequency synthesizer using such a divide-by-two ILFD as the PLL prescalar was
reported in [20]. [23] proposed a low power quadrature generation circuit by injection
a pair of differential signals to two identical injection-locked frequency dividers. A
hard switching model was used for the analysis of the divide-by-two ILFD, which
gives a simple expression for the locking range and output amplitude. [24] introduced
a unified model to analyze general injection-locked frequency divider. The proposed
method uses a two-dimentional taylor expansion to treat the nonlinearity of the ILFD
with respect to the injection and oscillation signals. Specifically for the divide-by-
two ILFD, a piecewise nonlinearity model was used for the cross-pair in the ILFD,
which is more accurate than hard switching model in [23]. [25] introduced a easy
understanding graphic method to illustrate the operation of injection-locked oscilla-
tors. However, this method is not valid for certain topologies with nonlinear mixing
between the injection frequency and oscillation signal.
Ring oscillator based injection-locked oscillators are also reported in literature.
34
Compared with LC based ILOs, Ring-based ILOs usually generate stronger harmonic
components, burn more power and have inferior phase noise compared with LC based
ILOs. [26] introduced two ILFDs based on ring oscillators for divide-by-three and
divide-by-five operations. Both ring-ILFDs use a single injection applied to a common
ground node of all the delay cells. Such a single injection scheme is not effective
for multi-stage ring oscillators because multiple nodes with different phases tend to
cancel the effect of the single phase injection signal. Because of this disadvantage, [27]
proposed a multiple-input scheme for injection-locked ring oscillators. This scheme
can achieve the best locking range and uniformity of output phase spacing between
different nodes. The disadvantage of this approach, however, is the need for multiple
input with equally spaced phase differences, which are generally hard to generate.
2.1.4 High Speed Frequency Dividers and Multipliers
Clock dividers, or frequency dividers are essential building blocks in frequency
synthesis, quadrature signal generation, MUX/DEMUX, and radar systems. As dis-
cussed in the phase-locked loop section, the frequency divider is used in the feedback
path so that the PLL can achieve frequency multiplication. The first frequency di-
vider after the VCO in the PLL is called prescalar. It operates at highest speed
and consumes most of the power. In the design of high speed frequency dividers,
speed, power dissipation and phase noise are the most important specifications. As
the clock speed of communications and computing systems going to multi-gigahertz
range, frequency dividers in these systems are also running at such high speed. Such
high speed frequency dividers are under even more stringent power, speed and phase
noise trade-offs.
Currently, static or dynamic ”digital” dividers [28, 29, 30] are most common in
RF systems because it is widely believed that they have simpler structure, larger
bandwidth, and better robustness over process variations than conventional analog
35
Figure 2.8: Digital divide-by-two circuit and the implementation of the latch.
dividers. Digital frequency dividers can be implemented by two latches connected in a
negative feedback loop, as shown in Fig. 2.8. The implementation of latches depends
on the available type of transistors, but a current-steering topology consisting of
a differential pair and a regenerative pair achieves high speed in both bipolar and
CMOS technologies, as shown in the expand-out box in Fig. 2.8. Such a latch is
static latch. High speed divide-by-two digital divider circuits can also incorporate
dynamic latches. Fig. 2.9 shows two examples of such dynamic dividers. In Fig. 2.9
(a), the first two CMOS inverters operate as dynamic latches controlled by CK and
CK, and the third inverter provides the overall inversion required in the negative
feedback loop. In Fig. 2.9 (b), the divider is based on true single-phase clock (TSPC)
scheme, achieving a high speed. The drawback of both these dynamic dividers is the
lack of precise complementary or quadrature outputs. For both static and dynamic
digital dividers, as the operation frequencies increase, the trade-off between the speed
and power dissipation becomes more critical, especially in mobile applications. Due
to large power dissipation, high speed digital dividers can also introduce considerable
noise degradation.
Another type of frequency dividers are regenerative frequency dividers [31, 32].
Shown in Fig. 2.10 (a) is a high-speed divide-by-two method originally proposed by
36
Figure 2.9: Dynamic CMOS dividers using (a) inverters (b) true single-phase clock(TSPC) logic.
Figure 2.10: Analog frequency dividers: (a) Miller frequency divider and (b) para-metric frequency divider.
Miller [31]. So it’s also called Miller divider. Employing a mixer and a low-pass filter
in a feedback loop, the circuit operates as follows. Upon multiplication of the input
and output signals, the mixer generates components at fin + fout and fin − fout. If
the former is suppressed by the LPF but the latter is not, then fin − fout = fout, and
hence fout = fin/2. The Miller dividers can operate at speeds exceeding half of the
fT of its constituent devices, the drawbacks, however, is its substantial phase noise,
design complexity and power consumption.
37
Parametric frequency dividers[33, 34, 35] are another type of analog frequency
dividers, as shown in Fig. 2.10 (b). The frequency division principle of a parametric
frequency divider relies on exciting a varactor at frequency fin and realizing a negative
resistance that sustains a loop gain of unity at fin/2. High quality factor varactors
and inductors are key elements in parametric frequency dividers. Since in CMOS
processes, high quality factor passive devices are not available, parametric frequency
dividers are not suitable for CMOS integration.
An injection-locked oscillator (ILO) injection-locked at its harmonics can be used
as a frequency divider, namely, an injection-locked frequency divider (ILFD) [20].
According to the different oscillators utilized to build the frequency divider, ILFD
can be categorized into different topologies. Fig. 2.11 shows several common ILFD
topologies seen in literature. In Fig. 2.11a is an ILFD implemented by a single-
ended ring oscillator with the superharmonic injection current injected to the common
source of CMOS inverters [36]. An ILFD can also be built with a ring oscillator with
differential ring stages. Generally speaking, ring-oscillator-based ILFDs have a large
locking range and can support both even harmonic and odd harmonic injection locking
with proper injection topology. However, as ring oscillators do not provide filtering
like in a resonant oscillator, they tend to have large unwanted harmonic components,
particularly at the injected signal frequency. Their phase noise performance is also
inferior to resonant-based ILOs.
A resonant-based ILFD has inherent advantage in both speed and power dissi-
pation. Such an ILFD is fundamentally an resonant oscillator at the subharmonic
frequency of the input signal, which effectively lowers the speed requirement for the
process technology by n-fold. As a resonant circuit, only a fraction of the stored
energy is dissipated in every cycle, which is determined by the quality factor Q of the
resonator. This means that a resonant-based ILFD can have lower power consump-
tion than both a digital divider and even a ring oscillator based ILFD. At the same
38
Figure 2.11: Injection-locked frequency divider (ILFD) implementations: (a) ringoscillator ILFD, (b) Colpitts LC oscillator ILFD, (c) direct injection into the tank ofan LC differential LC oscillator, (d) injection through the tail of the LC differentialoscillator.
39
time, a resonant-based ILFD also has the advantages of simpler circuit structure than
regenerative frequency dividers and better tolerance for low-Q devices compared with
parametric frequency dividers. Because of these advantages, resonant oscillator based
ILFDs are gaining popularity in recent high frequency CMOS designs [20, 37, 38].
Resonant-based ILFDs, or LC-oscillator-based ILFDs also has single-ended version
and differential version. An example of single-ended LC ILFD is shown in Fig. 2.11b,
which is based on a Colpitts oscillator with the superharmonic injection applied to
the gate of the active transistor. This superharmonic injection mixes with the fun-
damental oscillation by the nonlinearity of the transistor. Single-ended LC ILFDs do
not provide differential output, and are susceptible common mode noise. This is why
differential LC ILFDs, built with differential LC oscillators, as shown in Fig. 2.11c
and Fig. 2.11d, are favored or the single-ended counterparts [20, 23]. Depending on
where the injection signal is applied, LC differential ILFDs can be further divided to
the direct injection topology and injection through the tail topology. The direct injec-
tion topology, as shown in Fig. 2.11c, has an injection device connected across the LC
tank, and the superharmonic signal direct injected into the tank through the injection
device. The injection device is usually a single transistor with source/drain terminals
connected to the two differential oscillation nodes. The superharmonic injection ap-
plied to the gate of this transistor mixes with fundamental oscillation and generate
an injection current at fundament frequency, which locks the output frequency of the
oscillator. The other injection topology, which injects the superharmonic through the
tail transistor, as shown in Fig. 2.11d, has no extra devices compared with a stan-
dard LC differential oscillator. The superharmonic injection voltage applied to the
gate of the tail transistor is converted to current by the tail transconductance and
fed into the common source of the cross pair. This injection current is steered by
the fundamental oscillation voltage like in a single balanced mixer. Thus the mixing
between the injection and the oscillation is also similar to the situation in the single
40
balanced mixer, and the generated fundamental current locks the output frequency of
the oscillator. The difference between the direct injection topology and the injection
through tail transistor topology is the different mixing mechanisms.
The discussed LC differential ILFD topologies above are all for even number har-
monic locking, ie., divide by two or divide by four. Odd number division ratios are
not supported by these topologies. In this dissertation, we proposed a new LC dif-
ferential ILFD topology for divide-by-3 harmonic locking in [39], which is the first
divide-by-odd-number ILFD based on a fully LC differential oscillator. The key dif-
ference of the proposed ILFD topology with conventional LC differential ILFDs is
the injection method. From the previous discussions, we found that all the injections
in conventional topologies are in a common mode fashion. But in our proposed new
topology, the superharmonic input is injected differentially into the oscillator. More
details will be discussed in chapter 3 for the new divide-by-odd-number ILFD.
Frequency multiplication is an important function in frequency synthesis, clock
distribution, and a wide range of RF and microwave applications. The combination
of a VCO with a frequency multiplier can enable the VCO to work at lower frequency
[40, 41, 42], which can be designed with better spectral purity. This arrangement can
also avoid a VCO pulling problem and allow a lower frequency synthesizer division
radio, as shown in Fig. 2.12. In a clock distribution system[43], using a frequency
multiplier as the local clock generator can enable the global clock to run at a lower
speed, which can both lower the power consumption and reduce the design difficulty
of the distribution network. Variable-modulus frequency multiplier can be a low-cost
solution for a multi-band system to switch the frequency between different bands. For
example, the 2.3GHz and 3.5GHz bands of WiMAX are roughly around the twice and
three times of the fundamental frequency of 1.15GHz, making it possible to generate
the frequency of the two bands by a VCO of 1.15GHz and a dual-modulus frequency
doubler/tripler.
41
Figure 2.12: Frequency synthesis with LO and frequency multiplier in an RF trans-mitter.
In frequency multiplier design, the main performance metrics are phase noise, out-
put power, undesired harmonics suppression, and power consumption. Phase noise
is of great concern as it directly influences the timing accuracy. An ideal frequency
multiplier will introduce no additional phase noise and the output will only show a
phase noise degradation of 20log(N) compared with the input because of the fre-
quency multiplication, where N is the multiplication ratio. For a practical frequency
multiplier, the design goal is to minimize the additional phase noise introduced by the
multiplier itself. Output power is also important, as the frequency multiplier needs
to drive its load at a certain power level. Working in the frequency synthesizer, the
load will be the mixer of the RF transceiver and in the clocking network, the load will
be the local clock distribution. Undesired harmonics at the output of the frequency
multiplier will generate interference in other bands through mixing and modulation,
and thus need to be suppressed as well. Power consumption directly relates to the
battery life of a battery-powered device and needs to be minimized.
A common implementation of frequency multiplier is to generate harmonics of an
42
Figure 2.13: Frequency multipliers by (a) harmonic generation and filtering, (b) re-generative harmonic doubling and (c) injection-locking.
input signal using a nonlinear device, and then choose the desired harmonic com-
ponent by a filter network, as shown in Fig. 2.13(a). The nonlinear device can be
a diode or a transistor biased at a small conduction angle [44, 45, 46, 47], and the
filter network is usually built by passive LC circuits. Such a harmonic generation and
filtering approach is effective if the filter network has a large quality factor, which
is available for high-resistivity-substrate processes like GaAs and SiGe. However, as
cost becomes the main driver for a system on-a-chip (SoC) solution for wireless sys-
tems, RF circuits are migrating to digital CMOS technologies for single-chip radio
solutions. This creates a challenge for the conventional frequency multiplier design
as a high-Q filter is very difficult to construct using the lossy on-chip inductors in
a digital CMOS process. Therefore, integrated CMOS frequency multipliers usually
have poor harmonics suppression [46].
Phase-locked loop with frequency divider in the feedback path can also work as a
43
frequency multiplier. The PLL-based frequency multiplication, however, suffers from
drawbacks like circuit complexity and power consumption.
The common mode node in a differential oscillator can be another way to extract
the doubled frequency component, as shown in Fig. 2.13(b). Such a frequency dou-
bling approach is called push-pull frequency doubling as in [38, 48], or regenerative
frequency doubling as in [49, 50]. During operation, the differential pair will each
conduct a current pulse for a fraction of period, and together generate the second
order harmonic current, which multiplies with the impedance of the common source
node at the second harmonic frequency to generate the desired output voltage. Other
higher order harmonic current will also be generated, however, the capacitive nature
of the common source node impedance will filter out most of the harmonic power at
higher orders. If we can consider the cross pair as a generalized harmonic generator
and the RC network at the common source node as the filter network, the regenera-
tive frequency doubler can fall into the category of harmonic generation and filtering
approach. The regenerative frequency doubler has a fundamental limitation, that is
the voltage swing at the common source node is limited. Further amplification and
buffering are needed before the frequency doubler can drive any meaningful load. In
[38, 48], the doubled frequency is extracted directly from the VCOs which are sup-
posed to be working inside the frequency synthesis PLL. For [49, 50], the doubled
frequency is extracted from oscillators which are injection locked by external input.
For the former case, the phase noise performance of the doubled frequency is deter-
mined by the frequency synthesis PLL, while for the latter case, the phase noise is
determined by the injection source at low offset frequency.
Apart from being utilized in regenerative frequency doubler as in [49, 50], injec-
tion locking can also be utilized as a direct frequency multiplication technique and is
capable of introduce extra valuable harmonic suppression. This is because an oscil-
lator can be locked to an input which is much smaller in amplitude. This in effect
44
is a large amplification for the desired harmonic. While at the same time, for other
harmonics, there is no such an amplification effect. The conventional frequency mul-
tiplier followed by an injection-locked oscillator thus is a promising candidate for high
harmonic suppression frequency multipliers in lossy digital CMOS processes. In such
an injection-locked frequency multiplier, the harmonic suppression can be expressed
as
HSm,n,inj = |Imω0
Inω0
||Zosc(mω0)
Zosc(nω0)|1η
(2.3)
where η is the injection ratio, which is defined as the amplitude ratio between the
injection signal and the output. The filter network this time is the oscillator itself,
so the subscript osc is added for distinction. The extra term of 1η
in Eqn. 2.3 can
introduce substantial increase for the harmonic suppression as the injection ratio can
have a value much smaller than one.
In [51, 52], such an injection-locked frequency multiplication idea are applied to
frequency tripler applications as shown in Fig. 2.13(c). In these implementations, an
input differential pair works as the harmonic generator, which generates strong third
order harmonic current by the current steering effect of the differential pair. This
third harmonic rich current then injects lock the oscillator core whose free running
frequency is near the third order harmonic of the injection. One problem with this
topology is the generated fundamental current by the input differential pair is even
stronger than the desired third harmonic. The only mechanism for suppression this
undesired fundamental is the band pass filtering of the LC tank.[52] has not reported
this harmonic suppression performance. But according to the low quality factor of
the oscillator tank reported by [52], the suppression of this undesired fundamental
component would not be satisfactory.
In this dissertation, we propose a new injection-locked frequency multiplier topol-
ogy to address the problem by two stage filtering in a compact structure. It also
45
support dual-modulus operation of both doubler and tripler modes. Fabricated in
0.18µm digital CMOS process with lossy substrate, this new topology achieves very
good suppression for undesired harmonics, in both doubler and tripler modes [53].
Details of the new injection-locked frequency multiplier topology will be discussed in
chapter 3.
2.2 Clock Distribution
Clock distribution will increasingly be one of the most challenging tasks in micro-
processors and other high-speed VLSIs. The 2007 ITRS roadmap projects that the
on-chip clock speed will continue to rise to near 9 GHz in 2015 [54]. Even though
the device feature size will shrink, the chip size will remain constant (about 16.7 mm
from edge to edge [54]) as more functions are added. If current clocking schemes
continue to be used, it is expected that skew and jitter will consume an increasingly
large portion of each clock cycle, and hence the time available for critical path will
eventually be less than the technology-allowed minimum delay beyond the 32 nm
node in 2013. This will largely defeat the purpose of any further clock speed in-
crease. In the meantime, the power consumption in clock distribution networks has
also become a serious problem. Currently, about 40% of total power consumption
of a high-performance microprocessor is used by the clocking circuitry [55]. As both
clock speed and transistor count increase, the projected power consumption of a high-
performance microprocessor will exceed the power density limit set by packaging [54].
Therefore, we need a new clocking solution that can achieve better skew and jitter
performance while consuming less power.
46
2.2.1 Conventional Monolithic Clock Distributions
Fig. 1.6 in Chapter 1 shows a conventional clock distribution scheme [14]. The
global clock is generated by an on-chip phase-locked loop (PLL) from an off-chip
reference clock, usually a crystal oscillator at tens of MHz. The global clock is dis-
tributed using a global clock distribution network, typically in an H-tree topology,
which consists of interconnect transmission lines and clock buffers, and then further
distributed by local clock distribution networks. Local clock distribution networks
can be another level of H-tree, as shown in the upper right corner in Fig. 1.6, or a
metal grid, as shown in the upper left corner. Both H-tree and metal grid are metic-
ulously designed to balance the clock arrival time at different positions in the clock
distribution network.
The Conventional distribution schemes are more or less monolithic in that a sin-
gle clock source is fed through hierarchies of clock buffers to eventually drive almost
the entire chip. This raises a number of challenges. First, due to irregular logic,
the load of the clock network is non-uniform, and the increasing process and device
variations in deep sub-micron semiconductor technologies further adds to the spatial
timing uncertainties known as clock skews. Second, the load of the entire chip is
substantial, and sending a high quality clock signal to every corner of the chip nec-
essarily requires driving the clock distribution network “hard”, usually in full swing
of the power supply voltage. Not only does this mean high power expenditure, but it
also requires a chain of clock buffers to deliver the ultimate driving capability. These
active elements are subject to power supply noise, and adds delay uncertainty – jitter
– which also eats into usable clock cycle. Jitter and skew combined represent about
18% of cycle time currently [56], and that results in indirect energy waste as well.
For a fixed cycle time budget, any increase in jitter and skew reduces the time left for
the logic. To compensate and make the circuitry faster, the supply voltage is raised,
therefore increasing energy consumption. Conversely, any improvement in jitter and
47
skew generates timing slack that can be used to allow the logic circuit to operate
more energy-efficiently.
In order to minimize the global clock skew, the global clock-distribution network
has to be balanced by meticulous design of the transmission lines and buffers. This
practice puts a very demanding constraint on the physical design of the chip. Even so,
the ever-increasing process variations with each technology generation still results in
greater challenges in maintaining a small skew budget. Another current practice is to
use a grid instead of a tree for clock distribution, as shown in the upper-left local clock
region in Fig. 1.6. A grid has a lower resistance than a tree between two end nodes,
and hence can reduce the skew. At the same time, a grid usually has much larger
parasitic capacitance (larger metal layers) than an equivalent tree, and therefore takes
more power to drive. Passive and active deskew methods [57, 58, 59, 60] have also
been employed to compensate skew after chip fabrication. Apparently this approach
increases the chip complexity, manufacturing cost, and in the case of active deskew,
power consumption and jitter.
Jitter poses an even larger threat to microprocessor performance and power con-
sumption. The global-clock PLL and clock-distribution network generate noise, and
hence contribute to global clock jitter. But the main culprit is usually the noise cou-
pled from other circuits, such as power supply noise, substrate noise, and cross-talks.
Short-term jitter (cycle-to-cycle jitter) can only be accounted for by adding timing
margin to the clock cycle, and hence degrades performance. Unlike skew, jitter is
very difficult to compensate due to its random nature. In order to reduce jitter, the
interconnect wires in the global clock distribution network need to be well shielded
from other noise sources, usually by sandwiching them between Vdd/ground wires
and layers. Shielding inevitably increases the parasitic capacitance of the clocking
network, which means more and larger clock buffers, and hence larger power dissi-
pation to drive them. In turn, having more buffer stages introduces another source
48
of jitter, and the situation deteriorates quickly with faster clock speed. It is evident
that current skew and jitter reduction techniques almost always result in higher power
consumption. A better clocking scheme with less jitter and skew directly translates
into power savings for a given performance target.
2.2.2 Emerging Gigahertz Clock Distribution Schemes
There have been intensive research efforts in recent years to address the chal-
lenges in high-speed clocking from different disciplines, including clockless design
(asynchronous circuits), optical interconnect, and resonant clocking, to name a few.
Each of these alternative solutions has its own technological issues to be addressed.
Optical interconnect potentially offers smaller delays and lower power consump-
tion than electrical ones, and is promising for the global clock distribution network
[61, 62, 56]. However, there are still great challenges in its silicon implementation,
particularly for on-chip electrical-optical modulators [63]. Wireless clock distribution
proposed in [64][65] suffers substantial overhead in chip area and power consumption
due to on-chip clock transceivers.
Among the proposed electrical solutions, a family of synchronized clocking tech-
niques, such as distributed PLLs [66, 67], synchronous distributed oscillators [68, 69],
rotary clocking [70], coupled standing-wave oscillators [71], and resonant clocking [72]
have recently been proposed to improve the performance of global clock distribution.
In [72, 73], on-chip inductors are added to all the local nodes of the global clock
distribution tree, and hence turn it into a single large resonator. Resonance improves
power efficiency. Therefore, this technique reduces dc power dissipation and lowers
jitter in the global clock distribution network. It is a good step in the right direction.
However, it does not provide deskew capabilities like injection-locked clocking. The
more stringent layout constraints due to on-chip inductors could even aggravate the
problem of skew.
49
In [66, 67], an array of PLLs is constructed using a voltage-controlled oscillator
(VCO) and loop filter at each node, and a phase detector between adjacent nodes.
Each PLL generates the local clock in the particular clock domain, which is synchro-
nized with others through the aforementioned phase detectors at the clock domain
boundaries. Global clock as in conventional clocking is removed in this scheme, and
hence it promises lower jitter. The drawbacks are that a) the global skew is still a
problem since deskewing only happens locally, and b) the sensitive analog circuits in
a PLL (phase detectors, loop filters, ring oscillators) are vulnerable to noise in the
hostile environment of digital circuits.
In [68, 69, 70, 71], an array of oscillators are connected to the global clock dis-
tribution network, and thus are synchronized by coupling. The resulting oscillator
array becomes a distributed oscillator. The difference is that in [70] the oscillator
array is a one-dimensional loop, and the phase of oscillators change linearly along
the array, similarly to a distributed VCO [74], which was based on traveling-wave
amplification [75]. In [71], the oscillator array generate a standing-wave pattern on
the network, i.e., each oscillator has the same phase. Essentially all these techniques
use a distributed oscillator with interconnects as its resonator. A distributed os-
cillator suffers the problem of phase uncertainty due to mode locking [66, 67, 69].
This is evident in that similar topologies can be used for either traveling-wave [70]
or standing-wave oscillation [71]. Another problem is that jitter tends to be worse
than conventional clocking since the global clock is now generated on chip using lossy
passive components, without the clean reference clock from the off-chip crystal oscil-
lator. It is noteworthy that [73] unintentionally adds injection locking to distributed
oscillator clocking and demonstrated good jitter performance.
Overall, all these promising technologies face significant technical difficulties and
require dramatic changes in processes technologies, design methodologies, or testing
methods, and hence will face significant resistance in adoption.
50
Figure 2.14: Injection-locked clocking with active deskew based on the ILO delaytunings.
In this dissertation, we proposed a new clocking scheme as shown in Figure 2.14.
Similar to conventional clocking, the global clock is generated by an on-chip PLL, and
distributed by a global tree. The difference is that we use injection-locked oscillators
(ILOs) to regenerate local clocks, which are synchronized to the global clock through
injection locking. Another difference is that most global clock buffers in conventional
clocking are removed because the sensitivity of ILOs are much greater than digital
buffers. Essentially, we use ILOs as local clock receivers, similar to the idea of clock
recovery in communication systems. Note that this is different from resonant clocking
[73], where all the oscillators are coupled together. By utilizing their phase tunability,
ILOs in injection-locked clocking also serve as deskew circuit. They work with phase
detectors (PDs) and deskew logics (DSKs) to form deskew loops, which reduce the
skew between different local clock distribution. Further, ILOs can be constructed
as frequency multipliers [76] or dividers[20, 39], and hence this scheme enables local
clock domains to have higher (n × f0) or lower clock speed (f0/m) than the global
clock (f0). Such a global-local clocking scheme with multiple-speed local clocks offers
significant improvements over conventional single-speed clocking scheme in terms of
51
power consumption, skew, and jitter. More details on ILC will be presented in chapter
4.
2.3 Contributions of This Dissertation
Injection-locked oscillators have several useful properties which make them ideal
for high speed clock applications, including clock generation and clock distribution.
Firstly, because an ILO can be locked by its super-harmonics and sub-harmonics,
it is very convenient to build frequency dividers and multipliers based on an ILO.
Because of the resonant nature, such injection-locked frequency dividers (ILFDs) and
injection-locked frequency multipliers (ILFMs) generally can work at higher speed
than conventional frequency dividers and multipliers in the same technology. The
power consumptions are also smaller because of recycling of power by the resonator.
The trade-off is usually smaller bandwidth. So ILFDs and ILFMs are naturally
suitable for high-speed low-power narrow-band clocking applications.
Secondly, an ILO introduces phase shift between the injection signal and its out-
put. This phase shift is determined by the frequency offset between the injection
signal and the resonant frequency of its tank. So if we can control the resonant fre-
quency of the tank, we can tune the phase shift introduced by the ILO. This phase
tunability of ILOs makes it suitable for high-speed clocking applications where ac-
curate phase relationship control is required. Examples of such applications include
quadrature generation for wireless transceivers, multiple phase generation for phased-
array systems and active deskew for clock distributions.
The third useful property of an ILO is its capability to be locked by an injection
signal much smaller than its output signal strength. This effectively makes an ILO a
high gain amplifier. The high gain nature of an ILO enable it to work as a local clock
regenerator in a clock distribution network where the requirement for its input clock
52
strength is much smaller than conventional buffer chain based clock distributions.
This high gain property of an ILO can be also used together with its narrow band
nature to function as a high-Q bandpass filter. Such a high-Q bandpass filter can be
used in frequency multipliers to suppress undesired harmonics where other bandpass
filter structures are not effective.
This dissertation presents our studies of gigahertz, high performance, low power
clock generation and distribution using injection-locked oscillators. For gigahertz
clock generation, we introduced a phase tuning scheme for injection-locked frequency
divider based dual-phase signal generators. The phase tuning capability in this
scheme comes from the tunable phase transfer characteristics of injection-locked
frequency dividers. Implemented with a frequency-tunable double-balanced divide-
by-two injection-locked frequency divider, the dual-phase signal generator prototype
achieves 100o differential phase tuning range around quadrature with generated signal
frequency of 5 GHz.
For gigahertz frequency division, we introduced a divide-by-odd-number injection-
locked frequency divider to address the division ratio limitation of conventional injection-
locked frequency dividers. With differential injection and harmonic filtering, this new
ILFD topology maintains the fully differential nature of the output signal, while at
the same time achieves effective mixing between the injected odd harmonics and out-
put oscillation. 5% locking range without frequency tuning is achieved for the circuit
prototype of this topology working at input frequency of 16-18 GHz.
For gigahertz frequency multiplication, we introduced a injection-locked oscillator
to work as a high gain, high Q harmonic filter for conventional harmonic-generation-
and-filtering frequency multipliers. This new approach achieves significant better un-
desired harmonic suppression for frequency multipliers built with lossy digital CMOS
processes. Frequency tunability of injection-locked oscillators also enables multi-mode
operations for such injection-locked frequency multipliers. The circuit prototype of
53
such a frequency multiplier achieves multiply by 2 and 3 dual-mode operation with
undesired harmonic suppressions better than 30 dB achieved for both modes.
For gigahertz clock distribution, we proposed injection-locked clocking using injection-
locked oscillators as the local clock regenerators. Because of ILO’s capability to be
locked by a small input signal, this new approach reduced a large amount of clock
buffers in global clock distribution. This not only reduces the power consumption, but
also reduces the skew and jitter which come from these clock buffers. The phase tun-
ability of ILOs can also be utilized to achieve the deskew function between different
clock domains. Three ILC circuit prototypes working at several gigahertz demon-
strated the better power and jitter performance and the built-in deskew capability of
ILC.
Chapter 3
Injection-Locked Oscillators for
Clock Generation
3.1 Analysis of Injection-Locked Oscillators
An injection-locked oscillator can be analyzed with a similar loop model as an free
running oscillator discussed in Chapter 2. Shown in Fig. 3.1-a, the loop model for
an ILO is also composed of an resonant tank H(jω) and the active circuit f(Vi, Vo).
Different from the free running oscillator case shown in Fig. 2.3-b, the active circuit
of an ILO loop model now have two inputs, instead of one. The extra input Vi, which
is the injection signal, introduces a phase shift between Vo and output current of the
active circuit, which is f(Vi, Vo) (Fig. 3.1-b). According to Barkhausen’s oscillation
criteria, this phase shift needs to be compensated by the resonant tank, as shown in
Fig. 3.1-c. Because of this phase shift, the oscillation frequency shifts away from the
resonant frequency of the tank. Instead, it shows exact the same frequency as the
injection signal if the injection frequency is within a particular frequency range.
This loop model qualitatively demonstrates how an ILO works. However, the
illustration of the phase shift introduced by Vi in Fig. 2.3-b is based on a linear model
54
55
Figure 3.1: Loop analysis of injection-locked oscillators. (a) loop model for an ILO,where the LC tank is represented by H(jω), and the active circuit is represented byf(Vi, Vo); (b) phasor representation of the phase shift introduced active circuit; (c)amplitude and phase response of the LC tank H(jω), showing the new oscillationfrequency at ωi, instead of ω0.
([18]), which is not valid in many ILO implementations. The introduction of nonlinear
models for the active circuit in an ILO has led to various ILO models reported in
literatures. In [23], a hard switching model was used to model a cross-coupled pair in
the LC differential ILO. This model successfully predicts the locking range and output
amplitude of this ILO topology at divide-by-two operation with reasonable accuracy.
However, the hard switching model is an over-simplification for the behavior of a
cross-coupled pair, and loses accuracy at small oscillation amplitudes. At the same
time, it only illustrates the divide-by-even-number operation of this ILO topology.
In [24], a unified nonlinear model was introduced to analyze general injection-locked
frequency dividers. The proposed method uses a two-dimensional Taylor expansion
to model the nonlinearity of the active circuit. The two-dimensional Taylor expansion
is carried out on the two inputs of the active cross-pair. One is the injection signal
and the other is the oscillation signal. Specifically for the divide-by-two ILFD, a
piecewise nonlinearity model was used for the cross-pair in the ILFD, which is more
accurate than hard switching model in [23]. However, this proposed unified model
56
has not been used to analyze non-even-number harmonic injection locking. At the
same time, because the mathematical treatment for the nonlinearity is to general, the
model results lose their design insights.
The increase of injection locking in high performance clocking applications makes a
new ILO modeling method necessary. This new modeling method should, at one hand,
be general enough to analyze both even-harmonic and odd-harmonic injection locking,
at the other hand, give enough design insights for circuit designers. Because of the
large signal and nonlinear natures of an ILO operation, frequency domain harmonic
analysis is a natural fit for ILO analysis. Numerical-method-based harmonic balance
has been applied to both Cadence and Advanced Design Systems (ADS) tools for
analyzing oscillators, forced and free-running ones. They are effective for computer
simulations of injection locking. However, such pure numerical method always lacks
the design insights, or physical pictures of the circuit being analyzed. In stead, an
analytical method based on harmonic balance concept should be employed, if one
intends to reveal the underlying working principles of an injection-locked oscillators.
3.1.1 ”Harmonic Balance” Analysis of Oscillators
A resonator-based oscillator can be analyzed using a one-port model as shown in
Fig. 3.2. The resonator is typically a linear passive network, and can be represented
by an admittance YR. The active circuit, on the other hand, is nonlinear, and needs
linearization in analysis. For a free-running oscillator with no external excitation, the
nonlinear active circuit can be linearized using the describing function method [15],
which represents the active circuit with its describing function, i.e., the fundamental
frequency component of the Fourier series for the nonlinearity under periodic excita-
tion of the oscillation signal. The describing function is shown as a linear admittance
57
YA here. Then the oscillation condition is formulated as:
YA + YR = 0 (3.1)
Figure 3.2: One port model for a resonator-based oscillator with the active circuitrepresented by a linear admittance.
Eqn. 3.1 can be used to analyze the oscillation frequency and amplitude. For
example, for an LC differential oscillator (Fig. 2.2-b), the admittances YA and YR can
be expressed as
YA = −GA (3.2)
and
YR = Gp + jGp2Q(ω
ω0− ω0
ω) (3.3)
where −GA is the negative conductance of the active circuit, and Gp is effective
parallel conductance, Q the quality factor, and ω0 = 1/√
LC the natural frequency
of the LC resonator, respectively. Therefore,
−GA + Gp + jGp2Q(ω
ω0
− ω0
ω) = 0 (3.4)
The imaginary part of the equation shows that the oscillation frequency is equal to
ω0, and the real part means that the active circuit compensates the loss of the LC
tank, from which the oscillation amplitude can be derived.
58
3.1.2 ”Harmonic Balance” Analysis of Injection-Locked Os-
cillators
Figure 3.3: One port model for an injection-locked oscillator with the active circuitdescribed in time domain.
For an injection-locked oscillator, the active circuit can not be easily linearized
using the describing function method since there are two excitations to the active
circuit, the oscillation signal vosc(t)) and injection signal iinj(t), which might be at
different harmonic frequencies. Instead, a full-blown harmonic balance analysis [77]
is needed for this scenario, as shown in Fig. 3.3. A time-domain nonlinear function
f(iinj(t), vosc(t)) relates the injection and oscillation signals to the active circuit out-
put iA(t). The resonator can be described by a linear transfer function in frequency
domain,
~IR = [YR] · ~Vosc (3.5)
where vectors ~IR and ~Vosc are the oscillation voltage and resonator current phasors
at all the harmonics, and hence the resonator transfer function [YR] becomes a trans-
fer matrix. The time-domain signals and frequency-domain harmonics are related
through Fourier transformations. The harmonic balance equation for the oscillator is
~IA + ~IR = 0 (3.6)
where ~IA represents the harmonic phasors of the active circuit output iA(t) = f [iinj(t), vosc(t)].
Assuming the injection signal ii is small enough, we can treat it as a small pertur-
bation to the oscillation signal in harmonic balance. First, the nonlinear function f
59
is expanded into a Fourier series under large-signal periodic excitation of vo(t). Then
the Fourier coefficients are linearized at the dc value of ii(t), Idc. This process can be
formulated as the following:
f =
∞∑
h=−∞Ah(iinj)e
jhωt
≈∞
∑
h=−∞Ah(Idc)e
jhωt + ii
∞∑
h=−∞A
′
h(Idc)ejhωt (3.7)
Note that since f is a real function
Ah(Idc) = A−h(Idc) (3.8)
It is worth noting that the Fourier series expansion is valid as long as the oscillator
is locked. Linearization for ii is justified when the injection signal is small compared
to Idc.
3.1.3 Common-Mode and Differential Injection
Figure 3.4: An injection-locked oscillator based on a cross-coupled LC differentialoscillator.
A common implementation of the ILO model shown in Fig. 3.3-b is the differential
60
LC oscillator as shown in Fig. 3.4. Such an ILO has fully differential output and is
robust to common mode noises. Injection currents ii1 and ii2 are fed into the sources
of the cross-coupled transistors M1 and M2. For each cross-coupled transistor, its
drain current (id1 or id2) is a nonlinear function of its source current (ii1 or ii2) and
gate-drain voltage1, and can be separately approximated using Eqn. 3.7. For M1, note
id1 = f(ii1, vgd1) = f(ii1, vo)
≈∞
∑
h=−∞Ahe
jhωt + ii1
∞∑
h=−∞A
′
hejhωt . (3.9)
where Ah and A′
h are values at Idc. Because of the circuit symmetry,
id2 ≈∞
∑
h=−∞Ahe
jh(ωt+π) + ii2
∞∑
h=−∞A
′
hejh(ωt+π) (3.10)
Note that because vgd2 = −vosc, there is a time delay of π/ω, which translates into
extra phase shift of hπ in each harmonic.
The differential output signal of the active circuit now is
iAd = id1 − id2
=∞
∑
k=−∞2A2k−1e
j(2k−1)ωt
+(ii1 + ii2)∞
∑
k=−∞A
′
2k−1ej(2k−1)ωt
+(ii1 − ii2)
∞∑
k=−∞A
′
2kej2kωt
(3.11)
1This is because the drain and source current (id and ii) are both single-value functions of vgs
and vgd. Once ii and vgd are known, vgs and hence id are determined, too.
61
Define the common-mode and differential injection signals as
iic =1
2(ii1 + ii2)
iid = ii1 − ii2 (3.12)
Then
iAd =
∞∑
k=−∞2A2k−1e
j(2k−1)ωt
+2iic
∞∑
k=−∞A
′
2k−1ej(2k−1)ωt
+iid
∞∑
k=−∞A
′
2kej2kωt (3.13)
There are three mechanisms contributing to iAd: (a) the odd harmonics of the os-
cillation signal, (b)the mixing product of the common-mode injection signal and the
odd harmonics of oscillation signal, and (c) the mixing products of the differential
injection signal and the even harmonics of the oscillation signal. We can draw sev-
eral conclusions: First, when the oscillator is free running, the differential oscillation
signal only consists of the fundamental frequency and its odd harmonics, which is
not surprising given the symmetry. Second, common-mode injection topologies can
only support divide-by-even-number injection locking. This is because only even-
order-harmonic injection can mix with odd harmonics of the output to generate the
fundamental current.
As a special case, we can study the divide-by-two ILO by the generic current
expansion of Eqn. 3.11. For divide-by-two ILOs, we have the small signal differential
injection signal as zero,
iid = 0 (3.14)
62
and common-mode small signal injection as
iic = Iinj1
2(ej(2ωt+φ) + e−j(2ωt+φ)) (3.15)
The fundamental of Eqn. 3.11 can be expressed as
Ioej(ωt+γ) = 2A1(Idc)e
jωt + Iinj1
2ej(2ωt+φ)A
′
−1(Idc)e−jωt + Iinj
1
2e−j(2ωt+φ)A
′
3(Idc)ej3ωt
(3.16)
By taking this fundamental current to the harmonic balance equation of Eqn. 3.6, we
can calculate the frequency range in which the balance holds. This range is called
locking range and the half-side locking range can be expressed as
|∆ω| ≤ ω0
2Q
|12Iinj[A
′
−1(Idc) − A′
3(Idc)]|√
[2A1(Idc)]2 − (12Iinj)2[A
′
−1(Idc) + A′
3(Idc)]2(3.17)
It is worth noting that if we assume hard switching model as in [23] for the cross pair,
we can directly calculate the Fourier coefficients and their derivatives from the sign
function nonlinearity of the cross pair. Those in Eqn. 3.17 are listed as A1(Idc) = Idc2π,
A′
−1(Idc) = A′
1(Idc) = 2π, and A
′
3(Idc) = − 23π
. Taking these coefficients into Eqn. 3.17,
we can write the half-side locking range as
|∆ω| ≤ ω0
Q
1√
( 3η)2 − 1
(3.18)
where η is defined asIinj
Idc. This is the same as the result calculated in [23], which
tells Eqn. 3.17 is a more general expression for the locking range of divide-by-two
ILFD and the hard switching model of Eqn. 3.18 can be viewed as a specialized and
simplified case of Eqn. 3.17.
A divide-by-two injection-locked frequency divider with ideal parallel RLC tank
63
Figure 3.5: Locking range simulation of a divide-by-two ILFD as compared the har-monic perturbation and hard switching models.
is simulated to verify the aforementioned locking range derivations. For the harmonic
perturbation method, the Fourier coefficients in Eqn. 3.17 are directly calculated
from the drain current waveforms of the cross-pair transistors in the free running
oscillator. A small perturbation in the dc bias current is then applied to the same
oscillator to get the derivatives of these Fourier coefficients. These Fourier coefficients
and their derivatives are important in determining the locking range of an ILO based
on this oscillator. They can serve as initial design guides before the real simulations
on the locking range, which are usually time consuming. The calculated locking
range from Eqn. 3.17 are compared with the real circuit simulation and the simplified
hard switching model in Eqn. 3.18. As shown in Fig. 3.5, the locking range derived by
harmonic perturbation model in has much better matching with real circuit simulation
than the simplified hard switching model, especially at small injection ratio.
64
3.1.4 Differential Injection for Odd-Harmonic and Funda-
mental Injection Locking
The differential current expression in Eqn. 3.11 not only explains the reason why
common-mode injection in differential LC ILO topology can only work with even-
order harmonics, it also points to the solution for odd-order-harmonic injection lock-
ing: differential injection. Suppose we have differential injection at the third har-
monic, which is
iid = Iinj1
2(ej(3ωt+φ) + e−j(3ωt+φ)) (3.19)
and common-mode small injection as zero,
iic = 0. (3.20)
From Eqn. 3.11, we can have the fundamental differential current from the differential
pair as
Ioej(ωt+γ) = 2A1(Idc)e
jωt + Iinj1
2ej(3ωt+φ)A
′
−2(Idc)e−j2ωt + Iinj
1
2e−j(3ωt+φ)A
′
4(Idc)ej4ωt
(3.21)
From similar procedure as in divide-by-two case, we can have the locking range for
divide-by-three ILFD expressed as
|∆ω| ≤ ω0
2Q
|12Iinj[A
′
−2(Idc) − A′
4(Idc)]|√
[2A1(Idc)]2 − (12Iinj)2[A
′
−2(Idc) + A′
4(Idc)]2(3.22)
For fundamental injection locking, or non-division ILOs, we have
iid = Iinj1
2(ej(ωt+φ) + e−j(ωt+φ)) (3.23)
65
Figure 3.6: Realizations of divide-by-three ILFD by differential cascode injection.
and
iic = 0 (3.24)
The fundamental of Eqn. 3.11 can be expressed as
Ioej(ωt+γ) = 2A1(Idc)e
jωt + Iinj1
2ej(ωt+φ)A
′
0(Idc) + Iinj1
2e−j(ωt+φ)A
′
2(Idc)ej2ωt (3.25)
Locking range can be calculated as
|∆ω| ≤ ω0
2Q
|12Iinj[A
′
0(Idc) − A′
2(Idc)]|√
[2A1(Idc)]2 − (12Iinj)2[A
′
0(Idc) + A′
2(Idc)]2(3.26)
where A′
0(Idc) and 2A1(Idc) can be calculated by the current waveform or simulation.
Differential injection for divide-by-odd-number ILFDs can be realized by a differ-
ential pair connected to the separated sources of the cross pair in a cascode fashion,
as shown in Fig. 3.6-a. Using divide-by-three as an example, Fig. 3.6-a shows the sig-
nal flows of different harmonics, where the third harmonic are injected differentially
66
by the input differential pair, and the fundamental current goes through a dedicated
bandstop filter added in order to sustain the oscillation. The bandstop frequency
of the filter is at the third harmonic in order to prevent a short current path for
the injection. Fig. 3.6-b shows another realization for the differential injection-locked
oscillators, where the differential injection is provided by a transformer, instead of a
differential pair, while the path for fundamental current is provided by the parasitics
of the transformer. Compared with differential pair injection, this topology reduces
the number of transistor stacks, thus can be used under smaller supply voltage, at
the same time, it only requires single-ended injection.
3.2 Injection-Locked Frequency Dividers (ILFDs)
An injection-locked oscillator locked to a super-harmonic input can be used as
frequency divider. Such a frequency divider is called injection-locked frequency di-
viders (ILFDs). An ILFD has inherent advantage in both speed and power dissipation
compared to a digital divider. It is fundamentally an oscillator at the subharmonic
frequency of the input signal, which effectively lowers the speed requirement for the
process technology by n-fold. As a resonant circuit, only a fraction of the stored
energy is dissipated in every cycle, which is determined by the quality factor Q of
the resonator. This means that an ILFD can have lower power consumption than a
digital divider. At the same time, an ILFD also has the advantages of simpler circuit
structure than regenerative frequency dividers and better tolerance for low-Q devices
compared with parametric frequency dividers.
Once locked to the input signal, the output of ILOs will maintain a determined
phase relative to the input signal (Fig. 3.7). The phase difference from the input
signal to the output is determined by the injection signal strength, the frequency
shift from its free-running oscillation frequency, and the frequency characteristics of
67
Figure 3.7: Phase tuning characteristics for a divide-by-two ILO in Fig. 2.7-b. η ≡Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation frequency, ∆ω ≡ ω−ω0
is the frequency shift, and Q is the LC tank quality factor.
the oscillator resonator. As shown in Fig. 3.7, the phase shift ϕ is a monotonic
function of the frequency shift ∆ω, and the function is quite linear within the locking
range except when close to the edges. By tuning the free-running frequency of the
oscillator, we can tune the phase of the output signal [?].
The phase noise behavior of an injection-locked oscillator also resembles that of a
first order PLL. The phase noise at ILO output is determined by that of input at lower
frequency offset, and by the oscillator itself at higher frequency offset. The corner
frequency which divide these two regions, which is similar to the loop bandwidth of
a PLL, is determined by the ratio of injection strength and oscillation amplitude.
Limited locking range is a constraint for the application of injection-locked fre-
quency dividers. A model similar to the one in Fig. 2.7-a by [23] makes a hard switch-
ing assumption for the cross pair, and gives a locking range equation for divide-by-two
68
ILFD as
Locking Range ∼= 4ω0
3Qη, (3.27)
where ω0 is the free ruining oscillation frequency of the ILFD, Q is the quality factor
of the tank, and η is the injection ratio defined as the injection strength over the
oscillation strength.
In order to increase the looking range, the most intuitive approach is to increase
the effective injection strength. While the available injection signal strength is limited
by the output of a previous device, to reduce the loss in the injection-path seems an
effective way to increase the actual injection strength. The injection node at the source
of the cross pair is the node of most interest along this approach, because the large
parasitic capacitance at this node tends to be a good leakage path for the injection
signal to ground. Adding an inductor in parallel with the parasitic capacitance to
form a resonance at the injection frequency is an effective way to reduce this leakage
and increase the locking range[21].
Notice the locking range equation presented above is based on a hard switching
assumption for the cross pair in the ILFD. Such an assumption has the best mixing
efficiency out of a single balanced mixer. If the oscillation amplitude is not large
enough to support such an assumption, the mixing efficiency and then the locking
range of the ILFD will drop. In order to maintain such an assumption and thus
obtain a large locking range, it is necessary to increase the oscillation amplitude of the
ILFD. Under the same power consumption, to increase quality factor and inductance
to capacitance ratio of the tank can help increase the oscillation amplitude. However,
an increase in quality factor will decrease the locking range from the locking range
equation. To increase the inductance to capacitance ratio is the only way to increase
the locking range along the approach of ensuring hard switching. There is also a
trade-off associated with this approach, which is the reduced load capability.
69
The most common injection-locked frequency divider introduced so far is the
divide-by-two topology as shown in Fig. 2.7-b. It is a differential LC oscillator with
a cross pair source coupled, and an injection (2ω0) at twice harmonic of oscillation
frequency (ω0) injected at the source node. This topology is suitable for divide-by-two
operation because of inherent second harmonic content at the injection node. Two
divide-by-two ILFDs driven by differential injection are used to generate quadrature
signal at high speed and with low power consumption, as proposed in [23]. How-
ever, because of mismatch between the two ILFDs, several degrees and quadrature
error exists at the quadrature output. The phase tunability of ILO can be utilized to
compensate for such quadrature mismatch. Such an application will be shown in the
subsection of the dual-phase signal generator.
In circuit implementations of divide-by-two ILFDs, the injection signal is usually
applied to the gate of the bias transistor, which converts the injection signal from
voltage to current. The injection current at 2ω0 then mixes with the oscillation voltage
to generate the desired frequency component at fundamental ω0. The mixing process
is like in a single-balanced mixer, where only odd harmonics (ω0, 3ω0, · · · ) of oscillation
voltage mix with the injection. So such a topology works for all even-harmonics
injection, because all even harmonics can mix with corresponding odd harmonics
to generate the fundamental component. However, as the input harmonic index
increase, the mixing efficiency drops because of a smaller weight on the corresponding
odd harmonics generated by the current steering function of the cross pair. Odd-
harmonics injection is not supported in this topology. We will address this problem
with a differential injection divide-by-odd-number injection-locked frequency divider
topology.
70
3.2.1 Divide-by-Two ILFD for Dual-phase Signal Generation
In modern digital communication systems, it is increasingly important to generate
accurate multi-phase signals. For example, in-phase and quadrature LO signals are
required for quadrature modulation, quadrature down-conversion, and Weaver image
rejection. Passive phase shift circuits such as poly-phase filters [78] are commonly
used in low-GHz applications for this purpose. Their disadvantages are limited band-
width per stage, large signal attenuation, and noise degradation. Ring and coupled
oscillators can also be used to generate accurate multi-phase signals, but they suffer
from inferior phase noise performance, especially at high frequencies. Toggle-flip-
flop digital frequency dividers [28] are widely used to generate quadrature signals.
However, their phase accuracy depends on the input signal duty-cycle, and the large
power consumption is also a concern at high frequencies. Injection-locked frequency
dividers (ILFDs) [22] have been demonstrated for quadrature generation with good
phase accuracy and substantially lower power consumption [79, 23, 80, 81]. They
are particularly suitable for microwave and millimeter-wave applications where the
trade-off between speed and power consumption is more challenging [21, 39].
The application of ILFDs in signal generation is so far limited to quadrature
cases. Our work presents a study on generating signals with arbitrary and tunable
phase difference by utilizing the phase shift characteristics of ILFDs (more generally,
injection-locked oscillators). This is very attractive in applications that require tun-
able phases with fine phase resolution, e.g., phase array systems [82]. It can also be
used to improve the phase accuracy of quadrature generation.
An ILFD can be treated as a simplified regenerative divider with a built-in mixer
and filter [24, 25, 23]. For example, a divide-by-2 ILFD based on differential LC
oscillator (Fig. 3.8-a) can be modeled as a regenerative divider with a single-balanced
mixer and an LC tank filter (Fig. 3.8-b,c) . At large oscillation amplitude, assuming
ideal switching for the differential pair (M1 and M2), the output signal phase shift ϕ
71
M1 M2
ω2@iv
ddV
ω@ov
Mtail
+ −
(a)
)(ωZ
M1 M2
tII injbias ω2cos+
+ −ov
1i 2i 21 iii −=∆
(b)
tII injbias ω2cos+
)cos( ϕω += tVv oo)(ωZ−
NonlinearityLC tank
ovi∆ i∆
(c)
Figure 3.8: (a) Schematic,(b) equivalent circuit model and (c) behavior model ofa divide-by-two ILFD based on differential LC oscillator. The nonlinearity in thebehavior model comes from the switching of the cross pair.
72
Figure 3.9: Dual phase generation by injection-locking two ILFDs. The injecteddifferential signals result in quadrature phase difference at the ILFD outputs whenω01 = ω02. The output phases φ1 and φ2 are explicitly expressed as the sum of thequadrature phases and the phase shift parts ϕ1 and ϕ2 so that Eqn. 3.28 can bedirectly applied.
can be found [23]
ϕ =1
2[sin−1 3/η
√
1 + ( ω0
Q∆ω)2 + sin−1 1
√
1 + ( ω0
Q∆ω)2] (3.28)
where η ≡ Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation frequency,
∆ω ≡ ω − ω0 is the frequency shift, and Q is the LC tank quality factor. This phase
shift characteristics has been shown in Fig. 3.7, from which we can see the phase shift
ϕ is a monotonic function of the frequency shift ∆ω, and the function is quite linear
within the locking range except close to the edges.
When the injected signal changes phase by 180, the phase of the ILFD output
changes by 90. Therefore, when a differential signal is injected into two identical
ILFDs (Fig. 3.9) with the same free-running oscillation frequency (ω01 = ω02), the
two differential output signals are exactly in quadrature, i.e., ∆φ = 90 (Fig. 3.10-a).
The quadrature accuracy is determined by the mismatch between the two ILFDs, and
also affected by the injection ratio η and Q of the LC tank.
73
Figure 3.10: Phase tuning of two ILFDs: (a) quadrature; (b)(c) single-ended tuning;(d) differential tuning.
74
Figure 3.11: Schematic of the prototype double-balanced ILFD for tunable dual-phase signal generation. A first stage ILFD works as an active balun to convert asingle-ended input to differential signals, which are then fed into the input of thedouble-balanced ILFD stage.
When the two ILFD cores have different free-running oscillation frequencies (ω01 6=ω02), φ1 and φ2 will be no longer in quadrature but with another phase difference.
Therefore, if we frequency-tune ILFD1 or ILFD2, their phase difference ∆φ will change
accordingly. Fig. 3.10 shows some possible ways of phase tuning : we can fix ω02 (and
hence φ2), while tune ω01 to change φ1; we can also tune ω01 and ω02 (and hence
φ1 and φ2) differentially to achieve a larger phase tuning range. If the ILFD cores
are designed to center their frequency tuning range around half input frequency ω,
the phase tuning range will be around quadrature, and reaches its maximum when
tuning differentially. If desirable phase tuning range is around ∆φ = 0, we can just
injection-locked both ILFD cores with the same single-ended signal.
Notice that the signal amplitude is related to ϕ as[23]
V0 =4
πRIbias(1 +
η
3cos2ϕ) (3.29)
75
Figure 3.12: Chip micrograph of the prototype ILFD. The chip occupies an area of1.0 × 1.1mm2
where R is the equivalent tank resistance. Therefore, in order to maintain an equal
signal amplitude for the two outputs, it is also better to tune ω01 and ω02 differentially
around ω, in which case ϕ1 ≈ −ϕ2, and hence cos2ϕ1 = cos2ϕ2 .
From Eqn. 3.28 and 3.29, it can be seen that both phase shift and output am-
plitude strongly depend on the injection ratio η, which in turn depends on both the
injection current Iinj and bias current Ibias. In a simple differential LC ILFD, Iinj is
generated by a transconductor, usually made of the tail transistor. Any variation in
transistor size or bias voltage would translate into change in Iinj, and hence affects
the phase accuracy and amplitude equality. To address this problem, we introduce
a double-balanced structure similar to a Gilbert cell (Fig. 3.11). In such a double-
balanced ILFD, the input transconductor is replaced by a differential pair (M5 and
M6) operating in strong switching mode. Therefore, the injection ratio η is deter-
mined only by the Fourier series coefficients of an ideal sign function, and hence is
largely immune from variations in transistor size or dc bias, given the input voltage
76
Figure 3.13: Frequency tuning range of the free-running ILFD core.
signal is sufficiently large. Note that the injection current now consists of multiple
harmonics of 2ω.
In the prototype, NMOS inversion-mode varactors (Ct1 to Ct4) are used in the
LC tanks to tune the free-running oscillation frequency (Fig. 3.11). Another ILFD is
added to serve as an on-chip active balun in order to convert the single-ended signal
from a signal source to the differential injection signal with good phase noise. It is a
regular differential LC divide-by-two ILFD like the one in Fig. 1-a. Varactor tuning
is also included in the balun ILFD to cover the locking range of the main ILFD. Since
there is no stringent input bias requirement on the double-balanced ILFD, they are
directly dc coupled.
The circuit is fabricated using a standard 0.18um digital CMOS technology with
low-resistivity substrate. Spiral inductors are constructed using the 0.9um-thick top
metal layer. Due to the thin metal and lossy substrate, Q of the inductors is about
6 at 5 GHz. Two open drain differential buffers are used at the output ports. The
main ILFD consumes 8mA from a 1.8-V power supply. The balun ILFD and the open
drain buffers consume 4 mA and 18 mA from 1.4 V and 1.8 V vdd, respectively. The
77
Figure 3.14: Locking range and bounds in the middle of the tuning range. Notethat these are the input signal frequencies, which are 4 times that of the outputs. Amaximum of 17% locking range was achieved
die photo is shown in Fig. 3.12, and the chip size is 1.0mm × 1.1mm.
The circuit prototype is measured using a probe station. First, we measured the
stand-alone main ILFD cores (without the balun ILFD) implemented in a companion
test chip. Their tuning range when free-running is from 4.96 GHz to 6.16 GHz
(Fig. 3.13), and their locking range without tuning is found to be 17%. Then the
locking range of the prototype (with the balun ILFD) was measured at different
tuning points, and is found to be over 15% across the tuning range (Fig. 3.14). Notice
that the locking range extends symmetrically around the free-running frequency as
the injected power increases. Taking into account both the tuning and locking range,
the total operation frequency range then extends to 22%, from 4.78 GHz to 5.95 GHz
at the outputs.
The phase difference of the two output signals are measured using a sampling
oscilloscope. Cables and probes are calibrated to remove the phase mismatch intro-
duced by the measurement setup. Fig. 3.15-a shows the case of tuning the first core
78
(a) (b)
(c)
Figure 3.15: Phase tuning: (a) keep Vt1 constant and tune Vt2; (b) keep Vt2 constantand tune Vt1; (c) differential tuning of Vt1 and Vt2 at different injection frequencies.
79
Figure 3.16: Phase noise within the locking range, compared with that of input signal.Measured at 5.3 GHz output. 12-dB phase noise reduction comes from the divide byfour operation.
ILFD1 while keeping ILFD2 at the middle of its tuning range. The phase difference
can be varied by 55 around quadrature before the ILFD loses lock. Fig. 3.15-b shows
the opposite case of tuning ILFD2 only. A similar 50 phase tuning around quadra-
ture is achieved. When ILFD1 and ILFD2 are tuned differentially, about 100 (40 to
140) phase tuning is achieved for different input frequencies (Fig. 3.15-c). Compared
to single-ended phase tuning, differential tuning shows much better linearity in the
tuning characteristics.
The phase noise across the locking range is also measured, together with that of
the input signal, as shown in Fig. 3.16. The phase noise of the output is about 10
to 11 dB lower than that of the 21-GHz input. The phase noise suppression is quite
close to the theoretical value of 12 dB for divide-by-four operation.
80
Figure 3.17: Circuit evolution of ILFD from divide-by-two to divide-by-three. Themost important change is from common mode injection and differential mixing todifferential injection and single-ended mixing.
3.2.2 Divide-by-Odd-Number ILFD
In order to achieve divide-by-odd number operations for LC differential oscillator
based ILFDs, a different mixing scheme should be introduced to bypass the even
harmonics suppression of balance mixer, while at the same time, careful harmonics
design is necessary to maintain the differential topology for outputs. We introduce
such a new topology based on differential injection and single ended mixing scheme
to achieve the divide-by-odd number operations, while the LC differential oscillator
topology is perfectly maintained through dedicated harmonics engineering.
The evolution from a divide-by-two ILFD to a divide-by-odd number ILFD can be
shown in Fig. 3.17. In the divide-by-two ILFD, input at second harmonic is injected at
the gate of the tail transistor, the transconductance of the tail transistor converts the
injection voltage to current and feeds it into the common source of the cross-coupled
transistors. The output at fundamental frequency steers this injection current and the
81
1M 2M
)cos( tVosc ω
LC
1i 2i
)3cos( ϕω +tVinj )3cos( ϕω +− tVinj
)( 21 iiid −−=
Tank
)cos( tVosc ω−1v 2v
Figure 3.18: Circuit model for the divide-by-three ILFD, with differential injectionat the sources of the cross pair.
mixing component at fundamental is feed into the LC tank, which translate it back
to voltages at the output. The phase and voltage balance of this feedback system will
shift the oscillation frequency from its resonant frequency to be exactly half of the
injection frequency inside the locking range. This topology is efficient for divide-by-
two operation and can be used for higher order divide-by-even number operations with
decreasing efficiency as the division ratio increases. But it will have very bad efficiency
for divide-by-odd number operations, as the differential topology will largely suppress
the even number harmonics for the feedback voltages, which on the other hand, is a
must for divide-by-odd number operations as it is only the even harmonics that can
mix with the odd number harmonic input to generate fundamental frequency. While
in the divide-by-three topology, differential injection currents are feed into only one
of the cross-coupled transistors and mix with the feedback voltage single ended. This
topology do not has the even harmonics suppression during the mixing operation and
thus can support odd number division ratio. In order to maintain the signal path for
fundamental current, a filtering structure is introduced to provide a short between
the sources of cross-coupled pair, while it should appear as open for the injection odd
82
Figure 3.19: Loop model for the divide-by-three ILFD.
harmonics.
Fig. 3.18 and Fig. 3.19 show the modeling of the divide-by-three topology. Injec-
tion at third harmonic of the output is modeled by two differential voltages at the
sources of the cross pair (Fig. 3.18). Output is taken differentially from the drains
where it also mixes with the injection voltages by M1 and M2. The differential current
id = i2 − i1 which contains the mixed component of injection and output is filtered
by the tank of the LC oscillator, leaving only the fundamental frequency and filtered
out other unwanted harmonics. The transfer function of the LC tank
F (jω) = |Z(jω)|ejβ(ω), (3.30)
with the phase shift
β(ω) = −arctan(2Q∆ω
ω0), (3.31)
where ω0 is the resonance frequency of the tank, and ∆ω = ω − ω0 is the difference
between the 3rd subharmonic of the input frequency and the resonance frequency. In
the free running case, the oscillator is working at its resonance frequency ω0, so the
phase shift by the tank is zero. The feedback loop of the oscillator can maintain a
balance with the active circuit only provides the negative resistance to compensate
83
for the loss component in the tank. In the injection-locked case, the oscillator should
be able to work in a frequency band around its resonance frequency. This means
the phase shift of the tank will no longer be zero. At the same time, oscillation
condition requires the loop phase shift to be 0 or multiple of 2π. So, the phase shift
of the tank must be compensated by the active circuit in injection-locked oscillator.
In non-division cases, this phase shift is introduced by adding up the output with an
injection component at the same frequency[25]. And in division case, the phase shift
is introduced by mixing[24].
As for the divide-by-three topology in Fig. 3.19, the phase shift provided by the
mixing operation between the 3rd harmonic injection and the fundamental output in
a single-transistor fashion. Assuming a third order nonlinearity for transistors M1
and M2,
f(vi, vo) = a0 + a1(vi + vo) + a2(vi + vo)2 + a3(vi + vo)
3 (3.32)
the fundamental component that feeds into the tank Z(ω) is
id = i2 − i1
= 2a1Vosc cos(ωt) − 3a3Vosc2Vinj cos(ωt + φ)
= |id| cos(ωt + ϕ)
where
ϕ = −arctan[3a3Vosc
2Vinjsin(φ)
2a1Vosc − 3a3Vosc2Vinjcos(φ)
] (3.33)
The above equation shows the importance of differential injection, because if the
injection is in common mode, even if the mixing is performed by a single transistor,
the mixing generated fundamental current still will be canceled for differential mode
operation.
84
0L3M 4M
1T
biasV
1L 2L
1M 2M
ddV
1tC 2tCtuneV
1outV 2outVBuffer Buffer
injV
biasI
Figure 3.20: Circuit implementation of the divide-by-three ILFD. An input balun isused for single-end to differential conversion.
From the phase balance between cross-coupled pair and LC tank, the half band-
width locking range in percentage of ω0 can be expressed as:
∆ω
ω0
≤ 3a3VoscVinj√
(2a1)2 − (3a3VoscVinj)2
1
2Q
=1
√
(( 2a1
3a3VoscVinj)2 − 1)
1
2Q
which is inverse proportional to Q, the quality factor of the tank, and proportional to
the third order nonlinearity coefficient a3 of the cross pair transistors, the injection
level Vinj, and finally the the oscillation amplitude Vosc.
For circuit implementation of the proposed divide-by-odd-number injection-locked
frequency divider, we construct a differential cascode topology by adding another
differential pair of M3 and M4 (right in Fig. 3.20). A shunt-peaking inductor L0 is
also introduced to resonate with the parasitic capacitances at the 3rd harmonics[21].
85
5 6 7 8 9 10 11 12 13 14 15 16 17-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
5.4435 5.4485 5.4535 5.4585 5.4635-100
-80
-60
-40
-20
0 dB23 dB21
Spec
trum
(dBm
)
Frequency (GHz)
Figure 3.21: Output spectrum of the divide-by-three ILFD with 23-dB and 21-dBof second and third harmonic suppression. The zoom in of the fundamental outputshows a clean spectrum.
It also provides a short-circuit current path for the fundamental frequency component.
Therefore, the upper half circuit (M1, M2, L0 and LC tank) works as a differential
LC oscillator at the fundamental frequency, the lower one as a tuned differential
amplifier, while mixing is accomplished within the left and right half circuits single-
ended. Overall, we try to confine signals at different harmonics locally by circuit
topology and filtering for them to co-exist in harmony. A balun T1 is used to convert
a single-ended input from a signal source to differential signals. It also helps to match
the input impedance of M3 and M4 to 50 Ω. The input can be directly connected
when the ILFD is integrated with a on-chip differential source like a VCO.
A prototype ILFD with input frequency of 18GHz using the new topology has been
designed and fabricated using National Semiconductor’s 0.18µm CMOS technology
with low-resistivity epi silicon substrate. Spiral inductors are constructed using the
2µm-thick top metal layer. Due to the lossy substrate, Q of the inductors are about
3 at 6 GHz and 7 8 at 18 GHz. A open-drain differential buffer is used at the output
86
Freq
uenc
y(G
Hz)
Injected Power (dBm)-15 -13 -11 -9 -7 -5 -3 -1 1 3 516.6
16.8
17
17.2
17.4
17.6
17.8Upper BoundLower Bound
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Locking Range
Figure 3.22: Locking range vs. injection power for the divide-by-three ILFD. Amaximum of 1-GHz locking range was achieved.
Freq
uenc
y(G
Hz)
Voltage Amplitude @ Injection Port (V)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.816.6
16.8
17
17.2
17.4
17.6
17.8
Upper BoundLower Bound
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Locking Range
Figure 3.23: Locking range vs. injection voltage. The injection-voltage is calculatedby the incident power reading and the s11 at the input port, with cable and connectorloss calibrated out.
87
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.815.5
16
16.5
17
17.5
18
18.5
19Upper BoundLower Bound3f0
Freq
uenc
y(G
Hz)
Tuning Voltage (V)
Figure 3.24: Extended working frequency range which combines the frequency tuningrange and the locking range.
103 104 105 106-140
-130
-120
-110
-100
-90
-80
-70
-60
-50Injected Power 3.7dBmInjected Power -3.3dBmInjected Power -8dBmFree RunningAgilent E8244A
Phas
eN
oise
(dBc
/Hz)
Offset Frequency (Hz)
Figure 3.25: Phase noise performance vs. injection power. 9-dB phase noise reductionis because of the divide by three operation.
port. The ILFD core and output buffer consume 2.55 mA and 22.6 mA from a 1.8 V
power supply, respectively.
The prototype ILFD chip is measured by on-wafer probing. The loss from the
88
Figure 3.26: Chip micrograph of the divide-by-three ILFD, with a chip size of 0.9mm×0.9mm.
cables, adapters and probes between the signal source and the input port is char-
acterized to calibrate the injection signal power. S11 at the input port is measured
and used to calculate the gate voltage on M3 and M4. S11 is below -8 dB across
the frequency tuning and locking range, and the injection power is adjusted from
the incident power accordingly. The output signal spectrum in locked condition is
shown is Fig. 3.21. The 2nd and 3rd harmonics are -23 dB and -21 dB below the
fundamental frequency, and a large part of them is contributed by the open-drain
buffer at the output (single-ended measurement). The locking range increases from
0.3 GHz at injection power of -14 dBm to 1 GHz at 4 dBm with little change in the
center frequency (Fig. 3.22). The corresponding input port voltage is calculated using
S11 and shown in Fig. 3.23. Note that this is the single-ended voltage (amplitude)
at the primary of the balun with 1:1 transformation ratio. Considering impedance
matching, the differential voltage on the gates of M3 and M4 is about 0.5 times of
89
that, which clearly is compatible with on-chip VCOs. The ILFD can also be tuned
by the varactors Ct1 and Ct2 with the free-running frequency from 5.37 GHz to 6.1
GHz. The locking range shift with the free-running frequency and remain almost
constant across the tuning range. Fig. 3.25 shows the phase noise performance of the
ILFD at different injection power levels. The phase noise of the free-running ILFD
(no injection) and the signal source is also shown for comparison. Due to the low Q
of inductors, the free-running phase noise is not good at all. When the ILFD is in
locked condition, the phase noise is following that of the signal source with a 9-10-dB
reduction at large injection power (-3 dBm and 3.7 dBm) which matches well with
the theoretical value 9.5 dB. For small injection power (-8-dBm), the phase noise
degrades only at large offset frequency. Fig. 3.24 shows the phase noise performance
across the locking range at the injection power level of 3.7 dBM. At small offset fre-
quency (<200 KHz), the phase noise degradation at the edges of the locking range is
almost negligible, while at it deviates more at larger offset.
The die photo is shown in Fig. 3.26. The chip size is 0.9mm × 0.9mm.
3.3 Injection-Locked Frequency Multipliers (ILFMs)
3.3.1 Dual Modulus ILFM with Good Harmonic Suppression
As discussed in chapter 2, conventional implementation of frequency multipliers
with nonlinear element and filtering network can not provide good harmonic suppres-
sions in CMOS processes, as there is no passive elements with good quality factors.
We propose a new frequency multiplier topology which cascades an injection-locked
oscillator (ILO) after the harmonic generator to perform the frequency selection.The
ability of an ILO to be locked by a signal strength much smaller compared with
its output enables a small output level from the harmonic generator, while at the
90
same time, maintaining a relative large output strength for the multiplier as a whole.
This actually isolates the two closely related factors of output power level and har-
monic generator power in conventional approach. A smaller harmonic generator power
means smaller undesired harmonics also, which is easier to filter out. The ILO after
the harmonic generator acts as a high-gain bandpass amplifier which amplifies the
desired harmonic component to the required signal strength, while at the same time,
suppresses the undesired ones effectively. This new frequency multiplier topology can
also be easily configurable to be working at different multiplication ratios, say as fre-
quency doubler or tripler, depending on the relation between the input frequency and
the ILO oscillation frequency. If the ILO frequency is twice the input frequency, the
multiplier works as a doubler; if the ILO frequency is three times the input frequency,
the multiplier is a tripler. A mode control can be assigned to switch the circuit
between the two operation modes to achieve dual-modulus frequency multiplier.
The proposed injection-locked frequency multiplier (ILFM), integrates an injection-
locked oscillator (ILO) [20] with a harmonic generator to perform the frequency se-
lection, instead of solely relying on the output filter as in the conventional frequency
multiplier. The architecture of such a frequency multiplier is shown in Fig. 3.27.
The harmonic generator generates harmonic rich current, which is filtered by a filter
network composed by a tunable switched capacitor tank and an on chip transformer.
The transformer also serves as the input device of the last stage ILO [?]. The ILO
has a natural oscillation frequency at or near the desired output frequency, tunable
by a switched capacitor, and is locked by the desired harmonic frequency generated
by the harmonic generator.
There are several benefits of the new topology over the conventional approach. The
first one is better harmonic suppression in a lossy process. Because an ILO can be
locked by a very small input signal, the harmonic generator in this topology can run at
a relatively low power level. This means that, there are only insignificant undesired
91
Figure 3.27: Schematic of the dual-modulus injection-locked frequency multiplier.
harmonics at the harmonic generator output, which will be largely suppressed by
the two stage filtering in the frequency multiplier. No further filtering is necessary
since the harmonics suppression is already satisfied at the multiplier output. Since
an ILO can be built with low-Q passive devices, the proposed topology can provide
large harmonics suppression in a digital CMOS technology. Compared with injection-
locked frequency multipliers proposed in [51, 52], the harmonic rich current is band
pass filtered first before it feeds into the ILO. So the new topology can have superior
harmonic suppression within a compact structure. The harmonic suppresson for such
a new topology can be expressed as
HSm,n,inj = |Imω0
Inω0
||Zin(mω0)
Zin(nω0)||Zosc(mω0)
Zosc(nω0)|1η
(3.34)
in which the extra term of |Zin(mω0)Zin(nω0)
| can significantly improve the harmonic sup-
pression performance.
The second benefit of the new topology is that it decouples the output power of
the frequency multiplier from that of the harmonic generator. These two are directly
92
Figure 3.28: (a) Harmonic current generation vs. input voltage for a zero biasedNMOS transistor harmonic generator. (b) Harmonic current ratio to fundamental vs.input voltage for the same harmonic generator
correlated in the conventional design. In order to increase the output power level, the
conventional design would have to drive the harmonic generator with larger power,
but this also increases the power of the undesired harmonics. In the new topology,
the output power is determined by the bias of the ILO, and hence can be tuned
independently from the power level of the harmonic generator.
Another benefit of the new ILFM topology is that it can be easily implemented for
variable multiplication ratios. For example, if the ILO natural frequency is designed
twice as the input, the multiplier works as a doubler; if it is three times as the input,
it works as a tripler. Dual-modulus operation can be achieved by switching the ILO
natural oscillation frequency between these two frequencies.
As shown in Fig. 3.27, we choose an NMOS transistor with zero bias voltage
as the harmonic generator. This harmonic generator always works in class C mode
as the conduction angle is always smaller than pi. As the input voltage increase,
the generated harmonic currents also increase, but at the cost of higher dc current
consumption and higher fundamental current. This can be illustrated by the simu-
lated current components of such a zero biased NMOS transistor harmonic generator,
shown in Fig. 3.28.
93
Figure 3.29: Model of the filter between harmonic generator and the ILO. The trans-former model has neglected the parasitic inductance.
The filter between the harmonic generator and ILO can be modeled as in Fig. 3.29,
where the leakage inductance of the transformer has been neglicted. After transfer
the load impedance ZL to the primary, the filter structure becomes a parallel RLC
network driven by the current source of the harmonic generator. In order to maxi-
mize the voltage swing of the desired harmonic component on the load ZL, while at
the same time suppress the undesired harmonics, it is necessary to tune the tuning
capacitance Ct so that it forms a parallel resonance at the desired harmonic. This
tuning capacitance includes the parasitic capacitance of the harmonic generator and
transformer, and can be built with tunability to enable modulus change.
The final stage ILO is in a transformer-direct-injection topology [?]. The filtered
harmonic components are applied to the sources of the cross pair in the final stage
injection-locked oscillator by the transformer. These injections are amplified in a
common gate configuration by the cross pair before feeding into the LC tank of the
oscillator. This arrangement reuses the gain from the cross pair without extra power
consumption.
The dual-modulus ILFM prototype is designed and fabricated to verify the benefits
of the new technique. The schematic of the prototype dual-modulus ILFM is shown
in Fig. 3.27. It is designed with input frequency of 1.6 GHz and output frequencies
94
Figure 3.30: Die photo of the dual-modulus frequency multiplier. Osc is the oscillatorcore, T1 is the transformer and M1 is the harmonic generator.
of 3.2 GHz and 4.8 GHz at doubler mode and tripler mode, respectively.
The harmonic generator is a transistor in common source configuration with its
gate tied to the ground by a large resistor R1. Such a design makes the transistor
always operate in class-C region at different input levels. The ILO is built with LC
differential oscillator. A switched capacitor array Cs1 is used in the tank to switch
its natural oscillation frequency between 3.2 GHz and 4.8 GHz. The transformer-
direct-injection topology injects the input signal differentially to the ILO tank and
is effective for locking to odd harmonics, including fundamental, of its natural oscil-
lation frequency. In an ILFM, the ILO natural oscillation frequency is designed at
the desired harmonics of the input frequency, which is generated by the harmonic
generator. Since the input of this ILO topology is inductive, a capacitor Ct1 is added
in parallel to form a resonant filter, which adds another stage of frequency selection.
Another switched capacitor Cs2 is added in parallel to switch the resonant frequency
between 3.2 GHz and 4.8 GHz. An open drain differential buffer is used to facilitate
measurements in a 50-ohm measurement system.
95
Figure 3.31: Locking ranges of doubler and tripler modes vs. input levels, whichdetermine the output frequency ranges of the frequency multiplier.
The circuit prototype was fabricated in a 0.18-µm standard digital CMOS process
with low resistivity substrate. Transformer and symmetric inductor are both imple-
mented with the 0.35-µm-thick top metal layer. Transformer has a k factor of 0.77 at
both 3.2 GHz and 4.8 GHz. The symmetric inductor has an inductance value of 4nH
and quality factors of 2.9 and 3.9 at 3.2 GHz and 4.8 GHz, respectively. Die photo
of the circuit prototype is shown in Fig. 3.30, with the effective circuit size without
pads 0.4 by 0.1 mm. The whole test chip with pads has a size of 0.8 by 0.6 mm.
The dual-modulus injection-locked frequency multiplier is measured on a probe
station. The input signal is from a continuous-wave signal generator. The differential
output from the open drain buffer is measured by an SGS probe and only one branch
is measured in single-ended fashion. Locking ranges, which determine the output
frequency ranges, and harmonic suppressions of both doubler and tripler modes are
measured at different input amplitudes. Input amplitude is calculated based on the
96
Figure 3.32: Harmonic suppressions of doubler and tripler modes vs. input levels.
power reading of the signal generator and s11 of the injection port, with cable and
connector losses calibrated.
As shown in Fig. 3.31, locking ranges increase with input amplitude and as large
as 35.0% and 13.5% are achieved for doubler and tripler mode, respectively, both at
1-V input. Harmonic suppressions, on the other hand, decrease with increasing input
amplitude as in Fig. 3.32. For the minimum input level, the doubler mode shows a
fundamental suppression of 49 dB, and the tripler mode shows a fundamental and
second harmonic suppression of 62 dB and 54 dB respectively. In order to compensate
for PVT (process voltage and temperature) variations, a 5% locking range is usually
required. In order to achieve 5% locking range, for doubler mode, the input level is
0.48 V and the fundamental suppression is 42 dB; for tripler mode, the input level
is 0.64 V and the fundamental and second harmonic suppression is 40 dB and 32
dB (Fig. 3.33). The power consumption and locking range trade-offs are shown in
Fig. 3.34. A larger input level will have a larger locking range, but with more power
consumption. At 0.48-V input and in doubler mode, the core circuit without buffer
burns 2.2-mW dc power, and at 0.64-V input, in tripler, it burns 3.7 mW.
Phase noise of both doubler and tripler mode are shown in Fig. 3.35 with compar-
ison with their free running cases and the measured input from the signal generator.
97
Figure 3.33: Output spectra of doubler (a) and tripler mode (b) showing the harmonicsuppressions at 5% locking range input levels.
98
Figure 3.34: Power and locking range trade-offs for doubler and tripler modes.
As the phase noise of the signal generator is much lower than the noise floor of the
spectrum analyzer, the measured input phase noise and output phase noise already
hit the noise floor. Thus, we cannot see the theoretical 6-dB and 9.5-dB phase noise
degradation from input to output for doubler and tripler, respectively. On the other
hand, compared with free-running phase noises, as large as 60-dB phase noise sup-
pression at offset frequency of 10 kHz can be observed.
Due to the difficulty of capturing the dynamics of modulus change in measurement,
we simulate the multiplication modulus change in ADS with the control ramp time
of 5ns. The simulated dynamics is shown in Fig. 3.36. The nanosecond transit time
shows that an ILFM can be used in applications where fast modulus switching is
critical, like in clocking.
Table 3.1 compares the performance of the ILFM with some recent published
results of frequency multipliers. It can be seen that the proposed ILFM achieves
superior suppressions of the undesired harmonics even when compared with devices
fabricated in advanced technologies like SOI CMOS, SiGe, InGaAs, InGaP and GaAs.
Additionally, because of usage of low-Q inductors, which tend to occupy less area than
99
103 104 105 106 107−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
−50
−40
Offset frequency (Hz)
Phas
e no
ise (d
Bc)
Measured InputDoublerTriplerDoubler FreerunTripler Freerun
Noise floor ofspectrum analyzer
Ideal phase noise profileof signal source from specs
Figure 3.35: Output phase noises of doubler and tripler with comparison to freerunning conditions.
22 23 24 25 26 27 28 29 30 31 32 330
0.5
1
1.5
2
Time (ns)
Mod
ulus
con
trol (
V)
22 23 24 25 26 27 28 29 30 31 32 330
0.5
1
1.5
2
Time (ns)
ILFM
out
put (
V)
Figure 3.36: Simulated transient of modulus change from tripler to doubler for thedual-modulus ILFM, which shows a dynamics time in ns range. Limiting amplifier isadded at output to balance the amplitudes for the two modes.
100
Table 3.1: Performance Comparison with Other Works
Process Multiplication Output Pdc Fundamental 2nd harmonic Chip size without Reference
ratio frequency (GHz) (mW) suppression (dB) suppression (dB) pads (mm2)
CMOS Doubler 3.2 2.2 42 NA 0.1 × 0.4 This work
CMOS Tripler 4.8 3.7 40 32 0.1 × 0.4 This work
CMOS Doubler 5 12.6 20 NA 0.6 × 0.6 [46]
SOI CMOS Doubler 5.2 10 11 NA 0.29 × 0.13 [47]
SiGe Tripler 60 54 25 NA NA [40]
SiGe HBT Doubler 16 22 25 NA 0.7 × 0.35 [44]
SiGe HBT Doubler 36 95 35 NA 0.7 × 0.5 [44]
InGaAs PHEMT Tripler 36 18.9 21.4 22.3 2 × 2.5 [45]
InGaP HBT Doubler 16 200 25 NA 0.7 × 0.4 [83]
SiGe HBT Doubler 30 185 22 NA 0.45 × 0.55 [84]
GaAs PHEMT Doubler 56 70 29 NA 1.4 × 0.64 [85]
GaAs HEMT Doubler 56 275 23 NA 1.6 × 1.2 [86]
high-Q inductors, the circuit area of the ILFM is smaller compared with other works.
Chapter 4
Injection-Locked Clock
Distribution
4.1 Injection-Locked Clocking
Figure 4.1: Injection-locked clocking (ILC).
We propose a new clocking scheme based on injection locking as shown in Fig. 4.26.
Similar to conventional clocking, the global clock is generated by an on-chip PLL and
distributed by a global tree. The difference is that we use injection-locked oscilla-
tors (ILOs) to regenerate local clocks, which are synchronized to the global clock
101
102
through injection locking. Another difference is that most global clock buffers in
conventional clocking are removed because the sensitivity of ILOs are much greater
than digital buffers (see detailed discussion below). Essentially, we use ILOs as local
clock receivers, similar to the idea of clock recovery in communication systems. Note
that this is different from resonant clocking [73], where all the oscillators are coupled
together.
In addition to acting as clock receivers, ILOs can be constructed as frequency
multipliers [76] or dividers[22, 39], and hence this scheme enables local clock do-
mains to have higher (nf0) or lower clock speed (f0/m) than the global clock (f0).
Such a global-local clocking scheme with multiple-speed local clocks offers signifi-
cant improvements over conventional single-speed clocking scheme in terms of power
consumption, skew, and jitter.
4.1.1 Power Savings
Injection-locked clocking (ILC) can lead to significant power savings in high-
performance microprocessors. The benefits come from several sources. First, the
possible combination of a low-speed global clock and high-speed local clocks can
reduce the power consumption in the global clock distribution network. In the con-
ventional approach, this would require multiple power-hungry PLLs for frequency
multiplication. An ILO consumes much less power than a PLL because of their cir-
cuit simplicity [21]. This will become more evident in multi-core processors.
Second, ILOs have higher sensitivity than clock buffers (inverters). As a synchro-
nized oscillator, an ILO effectively has very large voltage gain when the injection
signal amplitude is small, while the gain of an inverter is much smaller (Fig. 4.2).
This can be easily understood if we realize that synchronization in an ILO is usually
achieved in tens to hundreds clock cycles, and hence in each clock cycle only a small
amount of injection locking force is needed. While an inverter needs to change its
103
Figure 4.2: Voltage gain of an inverter and an injection-locked oscillator at differentinput signal levels.
state twice in every clock cycle. As a result of this difference, the signal amplitude of
the global clock can be much smaller in the injection-locked clocking scheme, which
results in less power loss on the parasitic capacitance and resistance of the global-
clock distribution network. This will be increasingly attractive as the interconnect
loss becomes a dominant factor as the process technology scales further.
Further, the number of clock buffers in the global clock distribution can be re-
duced. In conventional clocking, in order to minimize jitter generated by clock buffers,
the global clock signal needs to be driven from rail to rail throughout the whole net-
work, and in turn many clock buffers are inserted. In injection-locked clocking, ILOs
can achieve good jitter performance with small input signal amplitude. Therefore,
the global clock signal amplitude no longer needs to be full swing, and few (or none
at all) clock buffers are needed on the global tree. Reduced number of clock buffers
directly translates into lower power consumption (Fig. 4.3).
More importantly, because injection-locked clocking significantly lowers skew and
jitter in the global clock, the timing margin originally allocated can be recovered, and
used for circuit operation. This can enable faster clock speed. Or, we can trade it
for lower power supply voltage (vdd), and save power dissipation from not only clock
distribution network, but all the logic gates on the chip.
104
Figure 4.3: Power savings of ILC relative to conventional clock distribution.
4.1.2 Skew Reduction and Deskew Capability
Skew in conventional buffered-tree based clock distribution is introduced mainly
by the clock buffer mismatch between different clock branches. In resonant clocking,
even though there are less clock buffers, thus less buffer mismatch induced skew, the
resonator itself can still generate significant skew due to the resonant frequency mis-
match between different resonators. This can be illustrated by the plot in Figure 4.4,
where the skews introduced by such frequency mismatch between resonantors are
plotted versus quality factor of resonators. From the plot we can see significant por-
tion of a clock cycle can be consumed by such skews introduced by resonant frequency
mismatch.
105
Figure 4.4: Skew introduced by resonant frequency error vs. quality factor of res-onator in a resonant based clock distribution.
Conventional active deskew methods [58, 87] compensate the skew by adding tun-
able delays to different clock paths. They are designed to reduce the clock skew
after the chip fabrication, and capable of tracking the skew variations dynamically.
The tunable delay is typically implemented by active delay lines which are loaded
with switched-capacitor arrays [58], or built with current starved buffers [87]. These
approaches proved effective in conventional clocking and have been applied to res-
onant clocking [88]. However, adding active delay lines has several disadvantages.
First, it consumes extra power; second, it increases the clock latency substantially
due to the delay tuning requirement; most importantly, the extra active delay line
tends to degrade the clock jitter significantly. This is because power supply noise
coupled through clock buffers is the main contributor to jitter accumulation in the
conventional clock distribution [60], and adding active delay lines for deskew further
increases the length of the buffer chain in the clock signal path.
Injection-locked clocking can have better skew performance compared with con-
ventional buffered-tree based clock distribution due to two reasons. Because the
number of buffers is reduced in the new clocking scheme, skew due to mismatch
106
of clock buffers is reduced compared to conventional clocking. More importantly,
injection-locked clocking provides a built-in mechanism for deskew. The phase dif-
ference between the input and output signals of an ILO can be tuned by adjusting
the center frequency of an ILO. This phase tuning capability enables ILOs to serve
as built-in “deskew buffers”. In turn, removing dedicated deskew buffers not only
saves power, but also reduces their vulnerability to power supply noise. Similar to ac-
tive deskewing in conventional clocking, phase detectors can be placed between some
local clock domains to check skew and then tune corresponding ILOs. Note that
this is different from distributed PLL approach [66, 67], where phase detectors have
to be added between all adjacent clock domains for frequency synchronization, and
then possibly for deskew. In injection-locked clocking, frequency synchronization is
achieved by injection locking, and the phase detection is used for deskew only. In other
words, injection-locked clocking with deskew tuning is a dual-loop feedback system,
and therefore provides both good tuning speed and small phase error (residue skew).
Because the excellent built-in deskew capability of ILOs, it can be expected that an
injection-locked clock tree has much more freedom in its physical design (layout).
4.1.3 Jitter Reduction and Suppression
Injection-locked clocking can significantly reduce jitter in global clock distribution
networks. First, reduced number of global clock buffers also means less pick-up of
power supply and substrate noise, and hence less jitter generation and accumulation,
as shown in Fig. 4.5a. Second, because of the design freedom in layout, clock inter-
connect can be placed where there is minimal noise coupling from adjacent circuits
and interconnects. In addition, similar to a PLL, an ILO can suppress both its inter-
nal noise through high-pass filtering and input noise through low-pass filtering, and
hence can possibly lower jitter at its output [17, 21]. Using a differential structure,
an ILO can be less insensitive to the common-mode power supply and substrate noise
107
than an inverter by design. Therefore, injection-locked clocking is likely to achieve
better jitter performance than conventional clocking.
Figure 4.5: Illustration of ILC jitter suppression in comparison with conventionalclocking.
Compared to other resonance-based clocking schemes proposed recently [71, 89],
ILC’s jitter performance is not limited by the quality factor Q of on-chip resonator,
which explains why injection locking has recently been adopted for resonant clocking
[90, 91, 88].
4.1.4 Potential Applications
With the numerous technical advantages, ILC can be used to improve high-end
microprocessors and the design process in many ways:
First, ILC reduces jitter and skew compared to a conventional clocking network.
This reduces cycle time and therefore allows a faster clock speed. As technology
scaling improves transistor performance but does not reduce jitter and skew (which
actually increase), the improvement in clock speed will be more pronounced over time.
Although further increasing whole-chip clock speed finds limited practical appeal in
108
today’s setting, it may still be effective in certain specialized engine inside a general-
purpose architecture.
Second, using ILC, clock distribution for a multi-core system is a natural exten-
sion from a single-core system. A conventional clocking scheme would require adding
chip-level PLLs. PLLs are bulky and particularly vulnerable to noise and hence usu-
ally placed at the very edge of a chip. In future multi-core systems, it represents
a significant challenge to place PLLs and route high-speed clock signal to the des-
tination cores. In contrast, in ILC, a single medium-speed global clock signal can
be distributed throughout the chip and locally each core can multiply the frequency
according to its need.
Third, even in a single-core architecture, different macroblocks can run at different
frequencies. This is referred to as the multiple clock domain (MCD) approach [92, 93].
Using ILC, we can locally multiply (or divide) the frequency of the single global clock.
One significant advantage of using ILC to enable multiple clock domains is that the
local clocks have a well-defined relationship as they are all synchronized to the global
clock. As a result, cross-domain communication can still be handled by synchronous
logic without relying on asynchronous circuits. Note that although ILOs are not as
flexible as PLLs in frequency multiplication, they are sufficient for MCD processors
as only a few frequency gears are needed for practical use [94].
4.2 Architecture Level Evaluation of ILC Power
Impact
We quantitatively demonstrate some benefits of ILC in a most straightforward
setting, a single-core processor running at a single clock frequency. We focus on
the energy benefits in this case study and compare processors that only differ in the
109
global clock distribution. Due to the limited availability of detailed characterization of
clocking network in the literature, our choice of the clocking network closely resembles
that of the baseline processor. Note that this is far from the optimal ILC design for
the given processor, but demonstrates significant benefits of ILC nonetheless.
Our baseline processor is Alpha 21264, which has the most details in public domain
on its clock distribution network [95, 96]. In this processor, an on-chip PLL drives an
X-tree, which in turn drives a two-level clocking grid containing a global clock grid
and several major clock grids. The major clock grids cover about 50% of the chip
area and drive local clock chains in those portions. The remaining part of the chip is
directly clocked by the global clock grid. The densities of the two levels of grids are
different. This configuration is illustrated in Fig. 4.6-a. The three planes X, G, and
M represent the three layers of clock distribution networks: the X-tree, the global
clock grid, and the major clock grids, respectively.
In the first ILC configuration (Fig. 4.6-b), we only replace the very top level of
the clock network (X). We remove all buffers in the X-tree trunk and replace the final
level of buffers (a total of 4) with ILOs. The rest of the hierarchy remains unchanged.
Note that in contrast to the Alpha implementation, we send low-swing signals on
the X-tree, which reduces the energy consumption of the top level clock network.
Furthermore, as discussed before, clock jitter and skew will also reduce. We convert
this timing advantage into energy reduction by slightly reducing the supply voltage.
While such a simple approach of using ILC as a drop-in replacement already
reduces energy consumption, it is not fully exploiting the benefits of ILC. As discussed
before, numerous ILOs can be distributed around the chip to clock logic macro-blocks.
Thanks to the built-in deskew capability, we can avoid using power-hungry clock grids
altogether. However, to faithfully model and compare different approaches, we need
parameters ( i.e., capacitance load of individual logic macroblocks) for circuit-level
simulation which we could not find in the literature. As a compromise, in the second
110
X
G
M
(a) XGM
I
G
M
(b) IGM
I
M’
(c) IM’
FP FP MultMapper
L2 Cache &Sys Interface
Int Exec
IntMapper
IntExec
Instruction Cache & Line and Set Predictors
Data Cache
FPQueue
MBOXITBFP Add
FileReg
DIVSQRT
PC
Branch Pred
Reg File
Int Q
Int Q
Reg File
Int Exec Int Exec
DTB DTB
LD/STPLL
(d) Floorplan (IM’)
Figure 4.6: Illustration of the three different configurations (a-c) of global clock dis-tribution, and a possible floorplan (d) for the ILC-based global clock distributionin Alpha 21264. Each configuration is designated according to its clocking network:XGM, IGM, and IM′.
ILC configuration, we still use grids, but use only a single level of grids, which consist
of all the major clock grids and the portion of the global grid that directly feeds logic
circuit (Fig. 4.6-c). With this configuration, the load of the clock network can be
derived based on results reported in [95, 96] and technology files. Finally, thanks to
the deskew capability of ILOs, there is no need to use a balanced global clock tree. In
Figure 4.6-d, we show an example clock tree design. In this example, each macroblock
in the floorplan is driven by an ILO which is at the leaf of the global clock tree.
To evaluate the benefits of injection-locked clocking, we perform both circuit- and
111
architecture-level simulations (In collaboration with Prof. Huang’s group) on the
baseline processors with each clock distribution configuration in Fig. 4.6. In order to
reflect the state of the art, we scale the global clock speed from 600 MHz to 3 GHz,
and correspondingly the process technology from 0.35 µm to 0.13 µm. The validity
of scaling is verified using Pentium 4 Northwood 3.0 GHz processor as the reference.
At the circuit level, we use a commercial circuit simulator, Advanced Design Sys-
tems (ADS), to evaluate power consumption and jitter performance of the clock distri-
bution network with different configurations. The simulations are based on extracted
models of the clock distribution networks, including the buffer size, interconnect ca-
pacitance, and local clock load capacitance. Then the distribution network model is
applied in the circuit simulation with ILOs and clock buffers constructed using SPICE
models of transistors.
PowerMeter
ClockJitter
Clock Distribution
vdd
NoisyPowerSupply
ClockSource
withJitter
Clock PeriodDistribution
T
PowerMeter
ClockJitter
Clock Distribution
vdd
NoisyPowerSupply
ClockSource
withJitter
Clock PeriodDistribution
T
Figure 4.7: Circuit-level jitter simulation setup.
Since jitter is largely introduced by power supply and substrate noise through
clock buffers, a noise voltage source with a Gaussian distribution is inserted to the
power supply node, as shown in Fig. 4.7. Transient simulation is used to calculate the
voltage and current waveforms along the clock distribution. Output clock waveform is
analyzed statistically to get the distribution of the clock period. Jitter at the output
is then calculated based on this distribution. Jitter is first measured in the baseline
conventional clocking configuration, and the noise source amplitude is determined by
112
matching measured jitter with reported value in [60], 35 ps. The same noise voltage
source is then used in the subsequent jitter simulation for the ILC configurations, and
the results are compared to the baseline configuration. We believe this approach is
actually pessimistic considering the target jitter number (35 ps) is among the lowest in
conventional clocking reported [56]. The source jitter from on-chip PLL is represented
using a built-in ADS model of clock with jitter, and the clock jitter is chosen to be 5
ps, which is consistent with jitter of on-chip PLLs published.
For architectural simulations, we use a heavily modified version of SimpleScalar
toolset extended with Wattch for the dynamic energy component, and HotSpot and
BSIM3 models for temperature-dependent leakage modeling in 0.13µm technology
with a vdd of 1.5 V. For brevity, the detailed parameters of the simulation are left in
the technical report [97].
In the circuit simulation, the PLL source jitter is set to 5 ps, and the value of
the added power supply noise source is chosen so that the output clock jitter for
the baseline processor (Fig. 4.6-a) is 35 ps [60]. Apparently, there is 30 ps jitter
added along the clock distribution, which comes from the power supply noise coupled
through the buffers. For the clock speed of 3 GHz, the overall jitter in the baseline
processor therefore corresponds to 10.5% of the clock cycle. In the case of ILC with
IGM configuration (Fig. 4.6-b), under the same power supply noise and source jitter,
the output clock jitter is lowered to 15 ps – a 57% reduction. This translates into
recovering 6% of a clock cycle at 3 GHz, a significant performance improvement.
The jitter reduction can be attributed to the reduced number of clock buffers and
good noise rejection of ILOs. When ILOs are used to directly drive the local clock
grids without the global grid as in IM′ configuration (Fig. 4.6-c), thanks to the further
reduction in the buffer stages, jitter is lowered to 12 ps, or 66% lower than the baseline.
These results clearly demonstrate that ILC can achieve better jitter performance than
conventional clocking.
113
In the current study, it is assumed that built-in deskew capability of ILOs can
reduce the skew to below 15 ps, resulting in 10ps savings in timing margin compared
to the baseline processor (without any deskew). This estimate is consistent with the
results using existing deskew schemes [56], and hence quite reasonable. In fact, we
believe ILC should lead to even lower skew, which can be supported by a test chip
measurement shown below.
The results of using different clocking structures are summarized in Fig. 4.8. In
this comparison, all configurations achieve the same cycle time. The density of the
grids and the driving capabilities are determined using circuit simulation. We choose
the design point where energy is minimized.
Simulations show that the power consumption of the baseline processor ranges
from 30.4 W to 50.4 W with an average of 40.7 W. The power can be divided into
three categories: global clock distribution power, leakage, and the dynamic power of
the rest of the circuit. The breakdown of the power is shown in Fig. 4.8. The global
clock is unconditional and consumes 9.2 W or (23%).
Now we analyze the power savings of ILC. For IGM (Fig. 4.6-b), power savings
come from two factors. First, the power consumed in the top-level X-tree is reduced
from 1.72 W to 1.56 W because of the reduction of the total levels of buffers used
and the lowered voltage swing on the X-tree. Second, as explained above, jitter and
skew all improved when using ILC: a 20 ps reduction in jitter and 10 ps in skew are
achieved. These savings increase the available cycle time for logic from 273 ps to 303
ps. This, in turn, allows a reduction in Vdd without affecting the clock speed. We
use the following voltage-delay equation from [98] to calculate the new Vdd, which is
1.415 V.
t =C
k′(W/L)(Vdd − Vt)
[
2Vt
Vdd − Vt+ ln
(
3Vdd − 4Vt
Vdd
)]
The power reduction for the tested applications ranges from 3 W to 5.2 W with an
114
average of 4.1 W, or 10.1%. The reduction is mainly due to the lowering of supply
voltage.
The second ILC configuration, IM′ (Fig. 4.6-c), further reduces clock distribution
power by reducing the size of the grid. For IM′, the global clock power is reduced to
5.9 W (from 9.2 W in XGM) and the combined jitter and skew reduction is 33 ps,
which allows us to scale Vdd to 1.41 V. The overall effect is an average of 6.8 W (17%)
total power reduction. Compared to IGM, IM′ further reduces power by 2.7 W, or
7%.
5.9 5.7 5.6 5.5
27.2 25.9 23.0 22.5
15.69.2
8.0 5.9
0
10
20
30
40
50
XG' XGM IGM IM'
Po
we
r (W
att
s)
Leakage power Circuit power Clock power
Figure 4.8: Breakdown of processor power consumption with different clock distribu-tion methods.
For reference, we also show the result of replacing the two levels of grids by a
single grid in the conventional configuration. Note that this grid is different from
the M′ grid as it needs higher density and larger buffers to achieve the same overall
cycle time target. We designate this grid G′, and the configuration XG′. We use
the same methodology to compute its jitter performance, clocking load, and power
consumption. From the results, it is clear that ILC significantly improves power
consumption. It is also clear that using a single-level grid per se is not the source
of energy savings for IM′: using a single grid in the conventional design leads to a
significant 7.9 W of extra power consumption.
Overall, we see that ILC can be introduced to a processor in various levels of
ease. With minimum design intrusion, when only the very top level of the clock tree
is modified to use injection locking, energy reduction is already significant (10%),
115
thanks to the lowered jitter and skew. When we further optimize the clocking grid,
the power savings become more pronounced (17%). All these are achieved without
affecting performance or the design methodology of the processor.
4.3 ILC Circuit Prototypes
4.3.1 Prototype I: ILC with Divide-by-two ILOs
The first test chip is designed with a divide-by-two injection-locked frequency
divider as the local clock regenerator. It is designed and implemented in a standard
0.18 um digital CMOS technology with low-resistivity substrate (Fig. 4.9-(a)). A
3-section H-tree mimics the global clock distribution network in real microprocessors.
The root of the H-tree is directly connected to a ground-signal-ground (GSG) pad to
facilitate testing (Fig. 4.10). The leaves of the H-tree are four divide-by-two ILOs ,
which divide the input 10-GHz clock signal into 5-GHz local clocks. The differential
outputs of ILOs then drive four open-drain differential amplifiers, which are directly
connected to output RF pads.
The differential divide-by-two ILO is shown in (Fig. 4.9-(b)). This is essentially a
differential LC oscillator, with the input signal injected into the gate of the tail tran-
sistor. We chose this ILO for the test chip because of its well-understood operation
and good performance. Spiral inductors are made on metal 5 with a quality factor
about 4 at 5 GHz. Such low Q is not a problem for ILO operation and actually helps
increase the locking range. If better metal is available, the power efficiency can be
further improved. NMOS transistors biased in inversion region are used as varactors
to tune the ILO center frequency, which in turn changes the phase of the local clocks
for deskewing purpose.
The H-tree is constructed using coplanar-waveguide (CPW) transmission lines.
116
(a) (b)
Figure 4.9: Schematic of (a) the test chip and (b) a divide-by-two ILO used.
Figure 4.10: Chip micrograph of the test chip. The whole chip size is 1.5mm×1.3mm,and each ILO occupies 0.25mm× 0.22mm. The H-tree sections measure 500 µm, 280µm, and 290 µm, respectively, from root to leaves.
117
Figure 4.11: Spectrum of the generated local clock signal from ILO1, identical to thatfrom other ILOs on-chip.
Bottom shield is used to reduce substrate coupling in a real microprocessor environ-
ment. This limits the maximum characteristic impedance of the transmission line to
be just over 40Ω in this technology. So the transmission lines from the H-tree leaves
to the root are designed to be 40Ω, 20Ω and 10Ω, respectively, in order to achieve
impedance matching at all junctions. Width of signal and ground lines, spacing be-
tween them, and choice of metal layers are also optimized for minimizing the clock
propagation loss.
The test chip is measured using an RF probe station. The input is a sinusoidal
signal from a continuous-wave (CW) signal generator. The power supply voltage is
1.4 V. The spectra of the local clock signals generated by the four ILOs are almost
identical, and one of them is shown in Fig. 4.11.
The locking range of ILOs on the test chip is found to be identical, and that of
ILO1 is shown in Fig. 4.12. The injection signal amplitude is calculated from the
measured incident power and reflection coefficient (S11) at the root of the H-tree.
118
Figure 4.12: Locking range of ILO1, identical to that of other ILOs on-chip.
It can be seen that when the input signal has rail-to-rail swing (1.4 V), the locking
range is about 17%, which is sufficient for both accommodating process/temperature
variation and deskew tuning (see below).
Phase noise of both the input and output clock signals are shown in Fig. 4.13. The
6-dB reduction (up to about 500-kHz offset) because of the divide-by-two operation
is evident, which shows that the internal ILO noise is suppressed by injection locking.
The deskew capability of injection-locked clocking is demonstrated in Fig. 4.14.
Fig. 4.14(a) shows the whole deskew surface when tuning ILO1 by Vtune1, and/or
ILO2 by Vtune2. One particular tuning example is shown in Fig. 4.14(b), where Vtune1
and Vtune2 are tuned differentially, and the deskew range is up to 80 ps. Because
of the large deskew range, small imbalance in the global clock tree can be easily
compensated, which greatly relaxes the requirement on the design and layout of the
clock distribution network.
The test chip consumes a total power of 52.8 mW, where 45.3 mW comes from
the 1.8 V-supplied open-drain buffers. The ILOs core circuitry working under 1.4-V
119
103 104 105 106−130
−125
−120
−115
−110
−105
−100
−95
−90
−85
−80
Offset frequency (Hz)
Phas
e no
ise (d
Bc)
Reference clockLocal Clock 1Local Clock 2Local Clock 3Local Clock 4
Figure 4.13: Phase noise of reference clock and 4 output clocks at different positionson chip.
Figure 4.14: Deskew capability of ILC. (a) deskewing when tuning ILO1 and/or ILO2;(b) deskewing when tuning ILO1 and ILO2 differentially. The skew is measuredbetween the two output clock signals of ILO1 and ILO2. Note that there is someimbalance between ILO1 and ILO2 caused by mismatch in the clock distribution treeand measurement system.
120
vdd only consumes 7.3 mW when biased low and injection signal is 6 dBm. The bias
circuitry consumes 0.2 mW.
4.3.2 Prototype II: ILC with Non-division ILOs
Because of the usage of the divide-by-two injection-locked oscillators, the global
clock needs to run at twice frequency of the local clock. Also, the analog tuning of the
ILO delay is not robust to noise at the controlling node. In order to improve over these
issues, a second ILC prototype was designed with non-division ILO as the local clock
regenerators. The non-division ILO are as described in chapter II with a transformer
injection because we need a single-to-differential conversion as we stick to single-ended
global clock distribution. For delay tuning part, we replace the frequency-tuning
varactor C1 in the LC tank with an array of more linear MIM switched capacitors
(Fig. 4.18), whose values are binary weighted and can be controlled digitally. Digital
tuning enables fast nonlinear deskew algorithm and thus can be very useful for runtime
deskew. It also avoids noise pick-up from the sensitive analog tuning voltage node.
As a special case of divide-by-odd-number operation, divide-by-one, or non-division
injection-locked oscillators also find applications in high-speed clocking systems, i.e.,
clock distribution and serial link clock recovery, because of its phase tunability and
noise filtering property. The most intuitive way to construct a non-division ILO out
of an LC differential oscillator will be adding a differential pair for direct injection
into the tank, as shown in Fig. 4.15-a. This topology requires a differential input,
which is not always available. For applications where there is only single-ended input,
another topology with transformer input is proposed in Fig. 4.15-b. In this topology,
the transformer convert the single-ended input to differential currents, which inject
to the sources of the cross pair. Bias of the tank can be provided by taping a current
source to the center of the transformer secondary.
In non-division ILOs, there is a direct summation of the injected signal and the
121
Figure 4.15: Two possible circuit implementation of non-division injection-lockedoscillator. (a) requires a differential input while (b) can take both single-ended anddifferential input.
Figure 4.16: Non-division ILO analysis: (a) circuit model; (b) loop behavior model.Similar to the divide-by-3 topology, differential injection is required for non-divisionoperation.
output voltage by the cross-coupled pair M1 and M2 (Fig. 4.16-a). Thus it can be
analyzed by the model [23] shown in Fig. 4.16-b, where the output phase ϕ is the
phase shift introduced by the ILO and Gm is transconductance from transistors M1
and M2.
122
The closed loop equation for the model is
Gm|Z(jω)|ejβ(ω)(Viejωt + Voe
j(ωt+ϕ)) = Voej(ωt+ϕ) (4.1)
where β(ω) is the phase part of Z(jω). Assuming large Q, we can write
β(ω) = −arctan(2Q∆ω
ω0
)
where ∆ω = ω − ω0.
Gm|Z(jω)|ejβ(ω)(ηe−jϕ + 1) = 1 (4.2)
where η = Vi
Vois defined as injection ratio of the ILO. The phase condition of the loop
equation is
−arctan(2Q∆ω
ω0) − arctan(
ηsin(ϕ)
1 + ηcos(ϕ)) = 0 (4.3)
from which we can derive the locking range of the new topology ILO as
LR =ω0
Q
η√
1 − η2(4.4)
which is a function of the oscillation frequency, quality factor of the tank and the
injection ratio.
With the phase condition Eqn. 4.3 , we can derive the phase shifting of the ILO
123
as:
ϕ = −arcsin(1
√
1 + ( ω0
2Q∆ω)2
)
−arcsin(1
η
1√
1 + ( ω0
2Q∆ω)2
) (4.5)
Phase shifting versus normalized frequency offset Q∆ωω0
from this equation for different
injection ratio is plotted in Fig. 4.17. From the plot, we can see three properties, (a)
the phase shifting is monotonic to the frequency offset and is quite linear except at
the edges of locking range; this property shows ILO can provide the desired linear
phase shifting for delay tuning. (b) the phase tuning range can be as large as 180
degrees; this shows the delay tuning range can be as large as half the clock cycle and
(c) phase shifting is inverse proportional to the injection ratio at a particular offset
frequency; this allows the delay step and delay tuning range to be programmable via
changing the injection ratio under fixed physical design.
A 4GHz test chip is designed and implemented in a standard 0.18 um digital
CMOS technology with low-resistivity substrate (Fig. 4.19-a).Input transformer and
symmetric inductor of the ILO are implemented on Metal 5 (Fig. 4.19-b). Transformer
has a k factor of 0.6 and inductor has a quality factor of 4, both at 4 GHz. The
switched capacitor array was designed with capacitance ratio of 1:2:4:8:16 to enable
a 5-bit binary tuning. Large capacitors are realized by combining multiple of the
minimum-sized unit capacitors to ensure linearity.
Four ILOs are placed as 4 local clock regenerators at the leaves of a 3-section
H-tree, which mimics the global clock distribution network in real microprocessors.
H-tree dimensions are 400 um, 100 um and 250 um for 3 sections, respectively. The
root of the H-tree is directly connected to a ground-signal-ground (GSG) pad to
facilitate testing (Fig. 4.19-b). The H-tree is constructed using coplanar waveguide
124
Figure 4.17: Phase shifting characteristics of non-division ILO at different injectionratios when Q = 4.
Figure 4.18: Non-division ILO with binary weighted switch capacitor tuning.
125
(a)
(b)
Figure 4.19: (a) Schematic and (b) chip micrograph of the test chip.
126
Figure 4.20: Measured spectrum for output clock.
(CPW) transmission lines. Bottom shield is used to reduce substrate coupling in a
real microprocessor environment. A differential open-drain buffer is used at each ILO
output to drive the 50Ω test port.
The whole test chip occupies an area of 1.5 × 1.8mm2. Each ILO uses only
0.37 × 0.1mm2.
In the measurement, the input is a sinusoidal signal from a continuous-wave (CW)
signal generator. Output clock spectra are measured for each ILO and one of them
is shown in Fig. 4.20. The clean spectrum and low harmonic contents proves that
injection locking is quite efficient. Locking ranges are measured with different input
signal amplitude. Up to 12% locking range are achieved for all the ILOs when input
amplitude is about 0.8V at the root of the H-tree (Fig. 4.21). According to the ILO
locking range equation (Eqn. 4.4 ), this locking range corresponds to an injection
ratio η of about 0.43.
In order to characterize the phase tuning of the ILO, free-running frequency tuning
using the switched capacitor array was first measured. A tuning range of 0.22 GHz
was achieved, which corresponding to 5.5% at a center frequency of 4 GHz. With
127
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.93.7
3.8
3.9
4
4.1
4.2
4.3
Boun
ds o
f Loc
king
Rang
e(G
Hz)
Upper BoundLower Bound
0
0.1
0.2
0.3
0.4
0.5
Voltage Swing@ Injection Port(Volt)
Lock
ing
Rang
e(G
Hz)
Locking Range
Figure 4.21: Measured locking range of an ILO in ILC.
Figure 4.22: 5-bit digital phase shift tunings for two ILOs’ outputs.
128
Figure 4.23: Phase noise comparison of the input clock from signal source and outputclock from the ILO.
the tank quality factor of 4 and the injection ratio of 0.43 derived from locking range
above, the theoretical phase tuning range is found to be 84o according to Eqn. 4.5.
Then phase tuning of the ILO at injection was directly measured from the output
waveform on a digital oscilloscope. Output waveforms for different tuning conditions
were recorded and their zero crossing indicated their phase shift information. Phase
shift tuning of two ILOs in the ILC are measured at the same time for comparison
purpose. After calibrating out their cable mismatch and referring the phase to that
of the center tuning point of one of the ILOs, the phase tuning curves are plotted in
Fig. 4.22. The result shows that a phase tuning range up to 80o is achieved, which
corresponds to 55 ps delay tuning range with a step size of 1.8 ps in time domain. Also
the measured 80o phase shift tuning range shows good agreement with the calculated
theoretical value of 84o.
129
Phase noise of both signal generator and ILC output are also measured and com-
pared in Fig. 4.23. From the comparison, we can see the ILO output phase noise
tracks exact the input phase noise up to 600 kHz. A self-triggered jitter measurement
[17] was made for both the signal source and ILC output to characterize their jitter
profiles. After removing the jitter introduced by the triggering circuit by
δ∆T,eff =√
δ2∆T,meas − δ2
∆T,min (4.6)
the effective rms jitters versus measurement time for both signal source and ILC
output are plotted in Fig. 4.24. From the comparison, we can see only 0.03 ps of
cycle-to-cycle jitter is added by the ILC network, which corresponding to 0.012% of
the clock cycle, thanks to the noise filtering effect of the ILO.
The power consumption for each ILO is only 0.95 mW under 1-V power supply
voltage. Each open drain buffer burns 4.2 mW from 1.5-V vdd.
4.3.3 Prototype III: ILC with Active Deskewing
In the first two ILC prototypes, the deskew operation is performed by external
controls, either using analog voltage or digital codes. Such external controls facilitated
the initial verification and characterization of the deskew capability of injection-locked
clocking. In the third ILC circuit prototype, we implemented on-chip deskew loop for
the ILC, which can be more useful for pratical applications [99].
In real systems, there are usually two types of active deskew architectures. The
first type has a reference clock driving no real load distributed to every clock domain.
The delay to all clock domains are carefully designed and compensated so that this
reference clock can act as a time standard for the real clocks in each domain. For
such an architecture with N clock domains, the extra cost for active deskew will
be N phase detectors and a reference clock distribution. The second type active
130
10−10 10−9 10−8 10−7 10−610−1
100
101
RMS
jitter
of s
ourc
e (p
s)
10−10 10−9 10−8 10−7 10−610−1
100
101
Measurement time relative to sampling edge (s)
RMS
jitter
of I
LC o
utpu
t (ps
)
Figure 4.24: Jitter measurement of the signal source and ILC output at 4 GHz. Fromextrapolation, the cycle-to-cycle jitter of signal source and ILC output are 0.11 psand 0.14 ps, respectively.
Figure 4.25: ILC with active deskewing.
131
deskew architecture does not distribute a separate reference clock, in stead, skews
are compared and compensated between adjacent clock domains. Suppose again a
clock distribution with N clock domains, a complete deskew of the whole distribution
requires N − 1 phase comparison and compensation processes. Thus, the cost for the
second approach is only N − 1 phase detectors and no extra reference distribution.
We will choose the second approach as the deskew architecture for our ILC prototype.
A self-controlled active deskew scheme based on injection-locked clocking is shown
in Fig. 4.25. In this scheme, clock skews between different clock domains are measured
by phase detectors (PDs). Skew information is fed back to the microprocessor core,
which then controls the operation of each deskew logic (DSK) associated every ILO.
Delays of each ILO are controlled by the deskew logics, so that after the deskew
process, the skew to the inputs of each clock domains are minimized. Depending on
the system requirement, such a deskew process can run once after the power up, or it
can be enabled periodically for a dynamic control on the skew as the microprocessor
conditions change with different working tasks and environment. Because digital
controlled deskew scheme has better noise rejection than analog controlled ones, the
digital based control will be maintained for the new ILC prototype.
Fig. 4.26 shows the schematic of the test chip to demonstrate the proposed ILO-
based active deskew in ILC. The input clock signal is distributed by a passive H-tree to
each clock domain, and injection-locked to an ILO. Each ILO drives a 2-pF clock load,
which models the local clock load in real processors, through a differential to single-
ended buffer (Buf1). The ILOs use a newly developed transformer-direct-injection
topology [100], as shown in the blow-up of Fig. 4.26. A 5-bit binary-weighted switched
capacitor array is implemented in the LC resonator of each ILO for phase tuning. The
5-bit binary coded digital control is generated from a 5-bit bi-directional counter built
inside the deskew control logics (DSKs). The DSK algorithm and an example of the
deskew sequence is shown in Fig. 4.27-a and Fig. 4.27-b. When the counter counts UP,
132
Figure 4.26: An injection-locked clocking system with ILO-based active deskew. FourILOs are driven by the input clock through an H-tree. Each ILO is buffered by Buf1
to drive 2 pF of on-chip load capacitor (CL), which also converts the ILO differentialoutput to a single-ended signal. Output buffers Buf2 drive the test ports (TPx).
133
Figure 4.27: (a) Deskew logic algorithm, and (b) an example of the deskew sequencewhich shows the design for ringing prevention.
134
Figure 4.28: Test chip die photo.
the ILO free-running frequency decreases, and the ILO phase tuning increases, and
vice versa. The counter value can also be preset from external. This enables a manual
adjustment of the ILO delays for test purpose. The clock phases from adjacent clock
domains are compared by a digital phase detector, and the 1-bit skew information
is fed into the deskew control logic, which is Dn in Fig. 4.27. The skew information
for two previous cycles are also stored by two registers R1 and R2 to implement the
ringing detection and prevention algorithm in the deskew control logic. Once a ringing
happens, the DSK forces the counter to enter a stop state, until a start from external
to restart the deskew control logic.
The test chip was fabricated in a standard 0.18 µm digital CMOS technology. The
clock frequency is set at 3.5 GHz, representing the state-of-the-art processor speed.
Each clock domain has a capacitive load of 2 pF to model the local clock load in
real processors. The transformer and inductors in the ILOs are all built with the 0.3
5µm-thick top metal layer. The transformers have a k factor of 0.77 at 3.5 GHz. The
symmetric spiral inductor have an inductance value of 2.8 nH and quality factor of
135
0.1 0.2 0.3 0.4 0.5 0.63.35
3.4
3.45
3.5
3.55
3.6
Boun
ds o
f Loc
king
Rang
e(G
Hz)
0
1
2
3
4
5
6
7
Input Amplitude (V)
Lock
ing
Rang
e(%
)
Figure 4.29: Measured locking range for the ILC network.
4.1 at 3.5 GHz. The die photo of the test chip is shown in Fig. 4.28, which measures
2 mm by 2 mm.
The test chip is measured on a probe station. Locking range of the ILC network is
measured up to 6.5% with input amplitude of 0.6 V (Fig. 4.29). Time domain clock
waveforms from each clock domain are measured on 50 GHz sampling oscilloscope
to study the clock timing. When comparing the timing of different clock domains,
connector and cable delay mismatch is first characterized and then used to calibrate
the measured results.
The free-running frequency tuning and locked state phase tuning of ILOs are first
characterized to show the deskew capability of the ILC network, as shown in Fig. 4.30.
The measured ILO free-running frequency tuning is pretty linear with a step size of
2.5M Hz. This linear frequency tuning generated a linear phase tuning for ILOs in
the locked state, with range of 40 ps and a step size of 1.25 ps Then the dynamics
of the deskew loop is measured as shown in Fig. 4.31. An initial skew of -16 ps is
preset between the two clock domains before the deskew loop starts. The deskew loop
reduces the skew to a final residual value of 2 ps within 15 cycles of the deskew clock
136
Figure 4.30: (a) Measured free-running frequency tuning, and (b) delay tuning underlocked state by 5-bit switched capacitor array.
137
Figure 4.31: Deskew dynamics of the deskew loop.
with a little overshoot. The residual skew can be attributed to the deskew step size
limitation and phase detector offset.
The phase noise of ILC output clock is measured and compared with that of the
input clock source and free-running ILO (Fig. 4.32a). The ILC output tracks the
phase noise of the source clock up to 10 MHz, and shows up to 60-dB improvement
over free-running case near the offset of 10 kHz. Cycle-to-cycle jitters for both ILC
output and input clock are measured with a self-triggered method, and compared in
Fig. 4.32b. The jitter accumulated in ILC is largely negligible (0.04 ps).
Each ILO in the test chip consumes 12 mA from a 1-V supply, and each Buf1
consumes 40 mA from 1.8-V supply. Each PD and DSK consumes 6.8 mA from 1.8-V
supply.
138
(a)
10−10 10−9 10−8 10−7 10−610−1
100
Measurement time relative to sampling edge (s)
RMS
Jitte
r (ps
)
Signal SourceILO output
1 T
0.11ps
0.15ps
(b)
Figure 4.32: (a) Phase noise of the ILC output in comparison with input clock andfree-running ILO. (b) Cycle-to-cycle jitter test for ILC output and input clock. Thedegradation is only 0.04 ps.
Chapter 5
Future Work and Conclusions
5.1 Future Work
5.1.1 Injection-Locked Clock and Data Recovery
Although most clock and data recovery circuits (CDRs) today are typically de-
signed based on a dual-loop phase-locked loop (PLL), such PLL-based CDRs are
difficult to design, costly in terms of power and area, and suffer from several other
limitations [6]. For example, in designing a PLL-based CDR, the designer must com-
promise between the ability to track the data signal and noise suppression of the PLL.
Additionally, the dynamics of PLL-based CDRs are dependent on the contents of the
data signal, and PLL-based CDRs can have a long locking time since they must lock
to both the frequency and phase of the data signal. PLL-based CDRs also suffer from
analog offsets and device mismatches which can cause the receiver circuitry to sense
the data signal at shifted, sub-optimal sampling points. Lastly, for chips receiving
multiple data signals, a dedicated PLL-based CDR must be provided for each data
signal. This is a costly requirement since these PLLs typically require relatively large
silicon area (e.g. for large filter capacitors) and dissipate relatively large amounts of
139
140
Figure 5.1: (a) Conventional dual-loop CDR, with loop I for frequency acquisition andloop II for phase acquisition. (b) Injection-locked CDR with frequency acquisitionachieved by injection locking.
141
power (e.g. for various high speed PLL components).
At the same time, most optical standards require a very narrow loop bandwidth
for CDR circuits, typically less than 1% of the operating speed. CDR circuits designed
under such a specification shows a very small frequency acquisition range and long
lock time. In order to solve this problem, dual-loop structure with another frequency
acquisition loop is proposed to extend the frequency acquisition range and increase
the lock speed, as shown in Fig. 5.1-(a). There are two disadvantages, however, for
such an approach. Firstly, in order to achieve frequency acquisition, the extra PLL
loop requires a reference frequency, usually from a crystal oscillator, which increases
the cost and integration difficulties. Secondly, the two PLL loops structure requires
a control mechanism to close one loop while the other is in operation, so that these
two will not conflict with each other. This complicate the design again.
We propose an injection-locked clock and data recovery circuit as shown in Fig. 5.1-
(b), where the frequency acquisition is achieved by the injection locking of the ILO,
while another phase acquisition loop adjusts the phase of recovered clock. Because the
data stream contains only frequency component which is half of the clock frequency,
a frequency doubler is required between the input data and the ILO. The relative
small lock time of an ILO can increase the frequency acquisition speed of such a
injection-locked CDR. While at the same time, no extra crystal oscillator reference is
needed, which greatly simplifies the structure and reduces the cost.
5.1.2 Injection-Locked Free-Space Optoelectronic Oscillators
Optoelectronic oscillators (OEOs) [101, 102], which consist of a laser, an optical
fiber cavity, a photo detector and an electrical amplifier, have received a lot of at-
tention as they can generate a low jitter optical pulse train and high-purity electrical
clock signal at the same time. Fig. 5.2 shows a typical optoelectronic oscillator topol-
ogy, where light from one of the output ports of the E/O modulator is detected by
142
Figure 5.2: Optoelectronic oscillator with optical fiber as the resonator.
the photodetector and then is amplified, filtered, and feedback to the electrical-input
port of the modulator. If the modulator is properly chosen, self-electrooptic oscilla-
tion will be sustained. Because both optical and electrical processes are involved in
the oscillation, both the optical subcarrier and the electrical signal will be generated
simultaneously. The oscillation frequency fosc of the optoelectronic oscillator shown
in Fig. 5.2 can be expressed as
fosc =k
τ(5.1)
where k is an integer, representing different possible oscillating modes and τ is the
total group delay of the loop, including the physical-length delay of the loop and
the group delay resulting from dispersive components in the loop. Because of such
a relationship, a hybrid opto-electronic oscillator using a long optical fiber as the
frequency selective element can permit high tunability and almost no limitation on
the range of possible oscillation frequency, due to the high mode density that can
be generated. The loop-delay time τ not only determines the mode spacing of the
oscillation frequency, it also has a huge impact on the phase noise of the oscillator,
with phase noise decreases quadratically with τ . The larger the τ , the smaller the
phase noise.
143
Figure 5.3: Injection-locked optoelectronic oscillator (OEO) with free space resonator.
We propose an injection-locked optoelectronic oscillator (OEO) with free space
resonator as shown in Fig. 5.3. In Fig. 5.3, we use a vertical-cavity surface-emitting
laser (VCSEL) [102] as both the laser source and the modulator, as compared with
the original OEO where an external laser source and an E/O modulator are used.
We also use free space optical link as the resonator instead of a long fiber. The free
space optical link includes two optical lens which focus the emission from the VCSEL
to the active area of the photo detector. The photo detector used in the experiment
is a PIN diode fabricated in standard digitcal CMOS process. A microwave amplifier
and electrical filter amplifies and selects the desired frequency of oscillation. A RF
coupler serve as both the electrical injection and output device. Since the free space
resonator delay can not be as long as the fiber version of OEO, the phase noise of free
space OEO tends to be inferior than fiber based version, however, external injection
can be used to clean the OEO oscillation spectrum purity. Such a free-space OEO
topology with external injection can be used as a method of generating clocks for
free-space optic communication links.
144
5.2 Conclusions
This dissertation presents our study of injection-locking based high-speed low-
power clock generation and distribution techniques. Specifically, several circuit tech-
niques for the design of injection-locked clock dividers, injection-locked clock multi-
pliers and injection-locked clock distribution are introduced with silicon circuit pro-
totypes and measurement results.
In chapter 3, three circuit innovations for injection-locking based clock generation
are demonstrated. These three circuit innovations include: divide-by-odd-number
injection-locked frequency divider using differential injection and harmonic engineer-
ing; double balanced divide-by-two injection-locked frequency dividers for tunable
dual phase signal generation using the phase tunability of injection-locked oscilla-
tors; dual mode injection-locked frequency doubler and tripler with good harmonic
suppressions in lossy digital CMOS process.
In chapter 4, injection-locked clock distribution is presented with discussions of its
benefits over conventional buffered tree based clock distribution. Due to the resonant
nature and high effective amplitude gain, and phase tunability of injection-locked
oscillators, injection-locked clock distribution can achieve better power efficiency and
better jitter performance, while at the same time has built-in deskew capability. Three
circuit prototypes are fabricated together with an architecture evaluation simulation
are used to verify these benefits of injection-locked clock distribution.
In chapter 5, some future works are discussed. These include injection-locked clock
and data recovery (CDR) for digital baseband communication and injection-locked
free-space optoelectronic oscillators (OEOs).
In summary, injection locking is a special type of forced oscillation. Due to its
resonant nature, PLL-like noise transfer nature and output phase tunability, it has
great potential in gigahertz, lower power clock generation and distribution. We have
145
demonstrated several such applications of injection-locking clocking circuits and dis-
cussed the their benefits through analysis and experiments based on real silicon test
chips. Some further applications of injection locking for high speed clocking are also
proposed.
146
Bibliography
[1] http://www.sonet.com/.
[2] http://www.ieee802.org/3/.
[3] http://www.wi-fi.org/.
[4] http://www.bluetooth.com/bluetooth/.
[5] http://www.wirelesshd.org/.
[6] B. Razavi. Monolithic Phase-Locked Loops And Clock Recovery Circuits: TheoryAnd Design. IEEE Press, 1996.
[7] B. Razavi. RF Microelectronics. Prentice-Hall, 1998.
[8] International Technology Roadmap for Semiconductors 2007 Edition.http://public.itrs.net.
[9] http://www.pcisig.com/specifications/pciexpress/.
[10] http://www.serialata.org/.
[11] http://www.rambus.com/.
[12] D.A. Hodges, H.G. Jackson, and R.A. Saleh. Analysis and Design of DigitalIntegrated Circuits. McGraw-Hill, 2000.
[13] Eby G. Friedman. Clock Distribution Networks in VLSI Circuits and Systems.IEEE Press, 1995.
[14] E.B. Friedman. Clock Distribution Networks in Synchronous Digital IntegratedCircuits. Proc. IEEE, 89(5):665–692, May 2001.
[15] T.H. Lee. The Design of CMOS Radio-Frequency Integrated Circuits. CambrideUniversity Press, Cambridge, U.K., 1998.
147
[16] A. Hajimiri and T.H. Lee. A general theory of phase noise in electrical oscilla-tors. IEEE J. Solid-State Circuits, 33(2):179–194, Feb. 1998.
[17] A. Hajimiri, S. Limotyrakis, and T.H. Lee. Jitter and Phase Noise of RingOscillators. IEEE J. Solid-State Circuits, 34(6):896–909, June 1999.
[18] R. Adler. A Study of Locking Phenomena in Oscillators. Proc. IRE, 34:351–357,June 1946.
[19] K. Kurokawa. Injection Locking of Microwave Solid-State Oscillators. Proc.IEEE, 61(10):1386–1410, Oct. 1973.
[20] H. Rategh, H. Samavati, and T.H. Lee. A CMOS Frequency Synthesizer withAn Injection-Locked Frequency Divider for a 5-GHz Wireless LAN Receiver.IEEE J. Solid-State Circuits, 35(5):780–787, May 2000.
[21] H. Wu and A. Hajimiri. A 19 GHz, 0.5 mW, 0.35 µm CMOS frequency di-vider with shunt-peaking locking-range enhancement. In IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, pages 412–3, 2001.
[22] H. Rategh and T.H. Lee. Superharmonic Injection-Locked Frequency Dividers.IEEE J. Solid-State Circuits, 34(6):813–821, June 1999.
[23] A. Mazzanti, P. Uggetti, and F. Svelto. Analysis and Design of Injection-Locked LC Dividers for Quadrature Generation. IEEE J. Solid-State Circuits,39(9):1425–1433, Sept. 2004.
[24] S. Verma, H. Rategh, and T. Lee. A Unified Model for Injection-Locked Fre-quency Dividers. IEEE J. Solid-State Circuits, 38(6):1015–1027, Jun. 2003.
[25] B. Razavi. A Study of Injection Locking and Pulling in Oscillators. IEEE J.Solid-State Circuits, 39(9):1415–1424, Sept. 2004.
[26] W.Z. Chen and C.L. Kuo. 18 GHz and 7 GHz Superharmonic Injection-LockedDividers in 0.25um CMOS Technology. In ”Proceedings of 2002 European Solid-State Circuits Conference (ESSCIRC)”, pages 89–92, 2002.
[27] J.C. Chien and L.H. Lu. Analysis and Design of Wideband Injection-LockedRing Oscillators with Multiple-Input Injection. IEEE J. Solid-State Circuits,42(9):1906–1915, Sept. 2007.
[28] B Razavi, K.F. Lee, and R.-H. Yan. A 13.4 GHz CMOS frequency divider. InIEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 176–177, 1994.
148
[29] H. Wang, A. Akbar, and B. Song. A 1.8V 3mW 16.8GHz frequency dividerin 0.25 µm CMOS. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,pages 196–197, 2000.
[30] H. Knapp, H.-D. Wohlmuth, M. Wurzer, and M. Rest. 25GHz static frequencydivider and 25Gb/s multiplexer in 0.12 µm CMOS. In IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, pages 302–303, 2002.
[31] R.L. Miller. Fractional-frequency generators utilizing regenerative modulation.Proc. IRE, pages 446–457, July 1939.
[32] R.G. Harrison. Theory of regenerative frequency dividers using double-balancedmixers. In IEEE Int. Microwave Symposium, pages 459–462, 1989.
[33] R.G. Harrison. A broad-band frequency divider using microwave varactors.IEEE Trans. Microwave Theory Tech., 25(12):1055–1059, Dec. 1977.
[34] R.G. Harrison. Theory of the varactor frequency halver. In IEEE Int. MicrowaveSymposium, pages 203–205, 1983.
[35] G.R. Sloan. The modeling, analysis, and design of filter-based parametric fre-quency dividers. IEEE Trans. Microwave Theory Tech., 41(2):224–228, Feb.1993.
[36] W.-Z Chen and C.-L. Kuo. 18GHz and 7GHz Superharmonic Injection-LockedDividers in 0.25um CMOS Technology. In Proc. ESSCIRC, pages 89–92, 2002.
[37] A. Natarajan, A. Komijani, X. Guan, A. Babakhani, and A. Hajimiri. A 77-GHzPhased-Array Transceiver With On-Chip Antennas in Silicon: Transmitter andLocal LO-Path Phase Shifting. IEEE J. Solid-State Circuits, 41(12):2807–2819,Dec. 2006.
[38] C. Cao and K. K.O. A 50-GHz Phase-Locked Loop in 130-nm CMOS. In IEEECustom Integrated Circuits Conf. Dig. Tech. Papers, pages 21–24, May 2006.
[39] H. Wu and L. Zhang. A 16-to-18GHz 0.18µm Epi-CMOS Divide-by-3 Injection-Locked Frequency Divider. In IEEE Int. Solid-State Circuits Conf. Dig. Tech.Papers, pages 602–3, 2006.
[40] S. Reynolds, B. Floyd, U. Pfeiffer, and T. Zwick. 60 GHz Transceiver Circuitsin SiGe Bipolar Technology. In IEEE Int. Solid-State Circuits Conf. Dig. Tech.Papers, pages 442–443, 2004.
149
[41] H. Zirath, T. Masuda, R. Kozhuharov, and M. Ferndahl. Development of 60-GHz Front-End Circuits for High-Data-Rate Communication System. IEEE J.Solid-State Circuits, 39(10):1640–1649, Oct. 2004.
[42] Y. Deval, J-B. Begueret, A. Spataro, P. Fouillat, D. Belot, and F. Badets.HiperLAN 5.4-GHz Low-Power CMOS Synchronous Oscillator. IEEE Trans.Microwave Theory Tech., 49(9):1525–1530, Sept. 2001.
[43] L. Zhang, B. Ciftcioglu, M. Huang, and H. Wu. Injection-Locked Clocking:A New GHz Clock Distribution Scheme. In IEEE Custom Integrated CircuitsConf. Dig. Tech. Papers, pages 785–788, 2006.
[44] J.J. Hung, T.M. Hancock, and G.M.Rebeiz. High-Power High-Efficiency SiGeKu- and Ka-Band Balanced Frequency Doubler. IEEE Transactions on Mi-crowave Theory and Techniques, vol. 53, No. 2, pp.754-761, 2005.
[45] J-C Chiu, C-P Chang, M-P Houng, and Y-H Wang. A 12-36 GHz PHEMTMMIC Balanced Frequency Tripler. IEEE Microwave and Wireless ComponentLetters, vol. 16, No. 2, pp.19-21, 2006.
[46] K. Yamamoto. A 1.8V Operation 5-GHz-Bnad CMOS Frequency Doubler Us-ing Current-Reuse Circuit Design Technique. IEEE J. Solid-State Circuits,40(6):1288–1295, June 2005.
[47] F. Ellinger and H. Jackel. Ultracompact SOI CMOS Frequency Doubler for LowPower Applications at 26.5-28.5 GHz. IEEE Microwave and Wireless Compo-nents Letters, vol. 14, No. 2, pp.53-55, 2004.
[48] C. Cao and E. Seok and K.K. O. 192-GHz push-pull VCO in 0.13um CMOS.Electron Lett., 42(4):208–209, Feb. 2006.
[49] J.P. Maligeorgos and J.R. Long. A Low-Voltage 5.1-5.8-GHz Image-Reject Re-ceiver with Wide Dynamic Range. IEEE J. Solid-State Circuits, 35(12):1917–1926, Dec. 2000.
[50] J. Wong and H. Luong. A 1.5-V 4-GHz Dynamic-Loading Regenerative Fre-quency Doubler in a 0.35-um CMOS Process. IEEE Transactions on Circuitsand Systems, 50(8):450–455, Aug. 2003.
[51] D.K. Ma and J.R. Long. A Subharmonically Injected LC Delay Line Oscil-lator for 17-GHz Quadrature LO Generation. IEEE J. Solid-State Circuits,39(9):1434–1445, Sept. 2004.
150
[52] W.L. Chan and J.R. Long. A 56-65 GHz Injection-Locked Frequency Triplerwith Quadrature Outputs in 90-nm CMOS. IEEE J. Solid-State Circuits,43(12):2739–2746, Dec. 2008.
[53] L. Zhang, D. Karasiewicz, B. Ciftcioglu, and H. Wu. A 1.6-to-3.2/4.8 GHz Dual-Modulus Injection-Locked Frequency Multiplier in 0.18um Digital CMOS. InIEEE RFIC Symp. Dig. Papers, pages 427–430, 2008.
[54] International technology roadmap of semiconductor. www.itrs.org, 2005.
[55] V. Tiwari et al. Reducing Power in High-performance Microprocessors. InDesign Automation Conference (DAC), pages 732–737, 1998.
[56] A.V. Mule et al. Electrical and Optical Clock Distribution Networks For Gi-gascale Microprocessors. Transactions on VLSI, pages 582–594, Oct. 2002.
[57] G. Geannopoulos and X. Dai. An adaptive Digital Deskewing Circuit for ClockDistribution Networks. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pa-pers, pages 400–401, 1998.
[58] S. Tam, S. Rusu, U.N. Desai, R. Kim, J. Zhang, and I. Young. Clock Genera-tion and Distribution for the First IA-64 Microprocessor. IEEE J. Solid-StateCircuits, pages 1545–1552, Nov. 2000.
[59] P.J. Restle et al. A Clock Distribution Network for Microprocessors. IEEE J.Solid-State Circuits, 36(5):792–799, May 2001.
[60] N.A. Kurd, J.S. Barkatullah, R.O. Dizon, T.D. Fletcher, and P.D. Madland. AMultigigahertz Clocking Scheme for the Pentium 4 Microprocessor. IEEE J.Solid-State Circuits, 36(11):1647–1653, Nov. 2001.
[61] J.W. Goodman, F.J. Leonberger, et al. Optical Interconnections for VLSI Sys-tems. Proc. IEEE, 72:850–866, July 1984.
[62] E. Kaimiley, P. Marchand, et al. Performance Comparison between Optoelec-tronic and VLSI Multistage Interconnect Networks. J. Lightwave Technol.,9:1674–1692, 1991.
[63] K.C. Cadien et al. Challenges for On-Chip Optical Interconnects. Proc. SPIE,5730:133–143, Nov. 2005.
[64] R. Li, X.L. Guo, D.J. Yang, and K. K.O. Initialization of a Wireless ClockDistribution System Using an External Antenna. In IEEE Custom IntegratedCircuits Conf. Dig. Tech. Papers, pages 105–108, 2005.
151
[65] X. Guo, D.J. Yang, R. Li, and K. K.O. A Receiver with Start-up Initializationand Programmable Delays for Wireless Clock Distribution. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 386–387, 2006.
[66] G.A. Pratt and J. Nguyen. Distributed Synchronous Clocking. IEEE Trans.Parallel Distributed Systems, 6(3):314–328, March 1995.
[67] V. Gutnik and A.P. Chandrakasan. Active GHz Clock Network Using Dis-tributed PLLs. JSSC, pages 1553–1560, Nov. 2000.
[68] H. Mizuno and K. Ishibashi. A Noise-Immune GHz-Clock Distribution Schemeusing Synchronous Distributed Oscillators. In IEEE Int. Solid-State CircuitsConf. Dig. Tech. Papers, pages 404–405, 1998.
[69] H.-A. Tanaka, A. Hasegawa, H. Mizuno, and T. Endo. Synchronizability ofDistributed Clock Oscillators. IEEE Trans. Circuits Syst. I, 49(9):1271–1278,Sep. 2002.
[70] J. Wood, C. Edwards, and S. Lipa. Rotary Traveling-Wave Oscillator Arrays:a New Clock Technology. IEEE J. Solid-State Circuits, 36(11):1654–1665, Nov.2001.
[71] F. O’Mahony, C.P. Yue, M.A. Horowitz, and S.S. Wong. A 10-GHz GlobalClock Distribution Using Coupled Standing-Wave Oscillators. IEEE J. Solid-State Circuits, 38(11):1813–1820, Nov. 2003.
[72] S.C Chan, K.L. Shepard, and P.J. Restle. Uniform-Phase Uniform AmplitudeResonant-Load Global Clock Distributions. IEEE J. Solid-State Circuits, pages102–109, March 2005.
[73] S.C. Chang, K.L. Shepard, and P.J. Restle. 1.1 to 1.6GHz Distributed Differen-tial Oscillator Global Clock Network. In IEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers, pages 518–519, 2005.
[74] H. Wu and A. Hajimiri. Silicon-Based Distributed Voltage Controlled Oscilla-tors. IEEE J. Solid-State Circuits, 36(3):493–502, Mar. 2001.
[75] E.L. Ginzton, W.R. Hewlett, J.H. Jasberg, and J.D. Noe. Distributed amplifi-cation. Proc. IRE, 36:956–969, Aug. 1948.
[76] K. Kamogawa, T. Tokumitsu, and M. Aikawa. Injection-Locked OscillatorChain: A Possible Solution to Millimeter-Wave MMIC Synthesizers. IEEETrans. Microwave Theory Tech., 45(9):1578–1584, Sept. 1997.
152
[77] K. Kundert, J. White, and A. Sangiovanni-Vincentelli. Steady-State Methodsfor Simulating Analog and Microwave Circuits. Springer, 1990.
[78] M.J. Gingell. Single Sideband Modulation Using Sequence AsymmetricPolyphase Networks. Electrical Communication, 48(1-2):21–25, 1973.
[79] A. Ravi, K. Soumyanath, L.R. Carley, and R. Bishop. An Integrated 10/5GHzInjection-Locked Quadrature LC VCO in a 0.18µm Digital CMOS Process. InProceedings of the 28th European Solid-State Circuits Conf., pages 543–6, 2002.
[80] S. Gierkink, S. Levantino, R. Frye, C. Samori, and V. Boccuzzi. A Low-Phase-Noise 5-GHz CMOS Quadrature VCO Using Superharmonic Coupling. IEEEJ. Solid-State Circuits, 38(7):1148–1154, Jul. 2003.
[81] P. Kinget, R. Melville, D. Long, and V. Gopinathan. An Injection-LockingScheme for Precise Quadrature Generation. IEEE J. Solid-State Circuits,37(7):845–851, Jul. 2002.
[82] X. Guan, H. Hashemi, and A. Hajimiri. A Fully Integrated 24-GHz Eight-Element Phased-Array Receiver in Silicon. IEEE J. Solid-State Circuits,39(12):2311–2320, Dec. 2004.
[83] D.W. kang, D.H. Baek, S.H. Jeon, J.W. Park, and S.C. Hong. A MiniaturizedK-band Balanced Frequency Doubler Using InGaP HBT Technology. In IEEEMTT-S Int. Microwave Symp. Dig., pages 107–110, 2003.
[84] S. Hackl and J. Bock. 42 GHz Active Frequency Doubler in SiGe BipolarTechnology. In Proc. 3rd Int. Microwave and Millimeter Wave Tech. Conf.,pages 54–57, 2002.
[85] T. Masuda et al. Low Power Single-ended Active Frequency Doubler for a60GHz-band Application. In Proc. GaAs, 2002.
[86] C.Fager, L. Landen, and H. Zirath. High Output Power, Broadband 28-56 GHzMMIC FrequencyDoubler. In IEEE MTT-S Int. Microwave Symp. Dig., pages1589–1591, 2002.
[87] S. Tam, R.D. Limaye, and U.N. Desai. Clock Generation and Distributionfor the 130-nm Itanium 2 Processor With 6-MB On-Die L3 Cache. IEEE J.Solid-State Circuits, 39(4):636–642, April 2004.
[88] Z. Xu and K. Shepard. Low-Jitter Active Deskewing Through Injection-LockedResonant Clocking. IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers,pages 9–12, 2007.
153
[89] S.C. Chang, P.J. Restle, K.L. Shepard, N.K. James, and R.L. Franch. A 4.6GHzResonant Global Clock Distribution Network. In IEEE Int. Solid-State CircuitsConf. Dig. Tech. Papers, pages 342–343, 2004.
[90] B. Mesgarzadeh, M. Hansson, and A. Alvandpour. Jitter Characteristic inResonant Clock Distribution. In Solid-State Circuits Conference, ESSCIRC2006. Proceedings of the 32nd European, pages 464–467, 2006.
[91] L. Lee and C.K. Yang. An Adaptive Low-Jitter LC-Based Clock Distribution.In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 182–183, 2007.
[92] G. Semeraro et al. Dynamic Frequency and Voltage Control for a Multiple ClockDomain Microarchitecture. In International Symposium on Microarchitecture,pages 356–367, Istanbul, Turkey, November 2002.
[93] A. Iyer and D. Marculescu. Power-Performance Evaluation of Globally Asyn-chronous, Locally Synchronous Processors. In International Symposium onComputer Architecture, pages 158–168, Anchorage, Alaska, May 2002.
[94] Y. Zhu, D. Albonesi, and A. Buyuktosunoglu. A High Performance, Energy Ef-ficient, GALS Processor Microarchitecture with Reduced Implementation Com-plexity. In International Symposium on Performance Analysis of Systems andSoftware, pages 42–53, Austin, Texas, March 2005.
[95] W. J. Bowhill et al. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU. Digital Technology Journal, 7(1):100–118, 1995.
[96] D. W. Bailey and B. J. Benschneider. Clocking Design and Analysis for a 600-MHz Alpha Microprocessor. IEEE Journal of Solid-State Circuits, 33(11):1627–1633, November 1998.
[97] L. Zhang et al. Injection-Locked Clocking: A Lower-Power Clock DistributionScheme for High-End Microprocessors. Technical report, Dept. Electrical &Computer Engineering, Univ. of Rochester, September 2007.
[98] A. S. Sedra and K. C. Smith. Microelectronic Circuits. Oxford University Press,2004.
[99] L. Zhang, B. Ciftcioglu, and H. Wu. Active Deskew in Injection-Locked Clock-ing. In IEEE Custom Integrated Circuits Conference (CICC), 2008.
[100] L. Zhang, Berkehan Ciftcioglu, and H. Wu. A 1V, 1mW, 4GHz Injection-LockedOscillator for High Performance Clocking. In IEEE Custom Integrated CircuitsConf. Dig. Tech. Papers, 2007.
154
[101] X. S. Yao and L. Maleki. Optoelectronic Oscillator for Photonic Systems. IEEEJournal of Quantum Electronics, 32(7):1141–1149, July 1996.
[102] P. Devgan and D. Serkland and G. Keeler and K. Geib and P. Kumar. AnOptoelectronic Oscillator Using an 850-nm VCSEL for Generating Low JitterOptical Pulses. IEEE Photonic Technology Letters, 18(5):685–687, March 2006.