h -s adc-b r...i am thankful to the members of my ph.d. oral examination committee: prof.david...

CLOCK AND DATA RECOVERY FOR

HIGH-SPEED ADC-BASED RECEIVERS

by

Oleksiy Tyshchenko

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

© Copyright by Oleksiy Tyshchenko 2011

CLOCK AND DATA RECOVERY FOR

HIGH-SPEED ADC-BASED RECEIVERS

Oleksiy Tyshchenko

Doctor of Philosophy, 2011

Graduate Department of Electrical and Computer Engineering

University of Toronto

ABSTRACT

THIS THESIS EXPLORES the clock and data recovery (CDR) for the high-speed

blind-sampling ADC-based receivers. This exploration results in two new CDR ar-

chitectures that reduce the receiver complexity and save the ADC power and area compared

to the previous work. The two proposed CDR architectures constitute the primary contribu-

tions of this thesis.

The first proposed architecture, a 2x feed-forward CDR architecture, eliminates the

interpolating feedback loop, used in the previously reported CDRs, in order to reduce the

CDR circuit complexity. Instead of the feedback loop, the proposed architecture uses a feed-

forward topology to recover the phase and data directly from the blind digital samples of the

received signal. The 2x feed-forward CDR architecture was implemented and characterized

in a 5 Gb/s receiver test-chip in 65 nm CMOS. The test-chip measurements confirm that the

CDR successfully recovers the data with bit error rate (BER) ≤ 10−12 in the presence of

jitter.

The second proposed architecture, a fractional-sampling-rate (FSR) CDR architecture,

reduces the receiver sampling rate from the typical integer rate of 2x the baud rate to a

fractional rate between 2x and 1x in order to reduce the ADC power and area. This archi-

tecture employs the feed-forward topology of the first contribution of this thesis to recover

ii

iii

the phase and data from the fractionally-spaced digital samples of the signal. To verify the

proposed FSR CDR architecture, a 1.45x receiver test-chip was implemented and charac-

terized in 65 nm CMOS. This test-chip recovers 6.875 Gb/s data from the ADC samples

taken at 10 GS/s. The measurements confirm a successful data recovery in the presence of

jitter with BER ≤ 10−12. With sampling at 1.45x, the FSR CDR architecture reduces the

ADC power and area by 27.3 % compared to the 2x feed-forward CDR architecture, while

the overall receiver power and area are reduced by 12.5 %.

Acknowledgments

GRADUATE STUDIES is a lot like a long journey. Sometimes it seems interesting

and exciting, while sometimes it seems hard and endless. Along the way of this

journey I met a lot of people and saw a lot of places. These people and places helped

me and inspired me to complete my journey even at times when the journey seemed to be

never-ending. Now, coming close to the end of my graduate studies, I would like to thank

the people who helped me and reflect upon the places that inspired me through out my

graduate school years.

First of all, I thank my supervisor, Prof. Ali Sheikholeslami, for his guidance through

the course of my Ph.D. work. His enthusiasm and insights have been a great source of

encouragement for me. I also thank Prof. Sheikholeslami for helping me to realize that

the graduate school in engineering is more than a technical education, rather it is a great

learning experience of solving problems and achieving goals.

I thank Dr. Hirotaka Tamura of Fujitsu Laboratories Limited (FLL), Kawasaki, Japan,

for his helpful comments, suggestions and constructive criticism at all stages of my Ph.D.

projects: from the project definitions, through architecture development and circuit imple-

mentation, all the way to test-chip measurements and publishing the results. Tamura-sensei,

you were very much like a co-supervisor for me during my Ph.D. studies, and I thank you

for all your help.

I am thankful to the members of my Ph.D. oral examination committee: Prof. David

Johns, Prof. Tony Chan Carusone, Prof. Sorin Voinigescu, Prof. Wai Tung Ng, Prof. Wei Yu;

and my thesis appraiser Prof. Michael Green for their criticism of this work and valuable

feedback.

I thank the former and current graduate students of Ali-group: Kostas Pagiamtzis,

Marcus van Ierssel, David Halupka, Jeff Chow, Scott McLeod, Pradip Thachil, Tina Tah-

iv

v

moureszadeh, Safeen Huda, Shayan Shahramian, Behrooz Abiri, and Siamak Sarvari, who

helped me turn my graduate school years into an interesting, enjoyable and diverse part of

my life. It was a great pleasure meeting them, working with them, and getting to know

them. I greatly appreciate the support of Kostas Pagiamtzis and Marcus van Ierssel, who

completed their graduate studies before I did, and who helped me believe that this journey

will eventually come to an end. Special thanks go to David Halupka with whom over the

past several years I shared the cubicle, my good and bad news, my excitement and frustra-

tions. Naturally, he shared the same with me, and I had to listen to all that. David, thank

you for your patience withstanding me all these years.

During my graduate studies I spent most of the time in Toronto, Canada. However, I

was lucky enough to see other places as well. The places are strongly associated with the

people who helped me see, explore and enjoy these places. I would like to thank these

people next.

I thank William Walker, Nikola Nedovic, Nestoras Tzartzanis, Francis Rotella and Mag-

nus Wiklund of Fujitsu Laboratories of America (FLA), Sunnyvale, CA, for welcoming me

to their team as an intern for half-a-year. It was my pleasure to learn from and to work with

the FLA team. I also thank the FLA team for allowing me to experience a professional,

good and friendly work environment.

During this internship, I had a great chance to explore the Bay Area in California, and

I thank people who helped me turn this time into an experience to remember. I thank

Jeff Chow for allowing me to “take over” his life during his leave from California, which

conveniently coincided with my internship. For several months, I stayed at Jeff’s apartment,

I drove Jeff’s car, and I used Jeff’s cell phone, which made my settling in San Jose, CA,

very smooth. I thank Kostas Pagiamtzis and Irene Goldthorpe for accompanying me on a

large number of trips in California. I also thank Irene and Kostas for helping me to realize

that loosing a bet can be just as pleasant as winning it.

I thank Laura Fujino and Prof. K.C. Smith for inviting me to attend the International

Solid-State Circuits Conference (ISSCC) as a student volunteer six consecutive times dur-

ing my graduate studies. The ISSCC attendance helped me to remain aware of the most

recent research work in the area of electronics performed all over the world both in indus-

try and in academia. Being part of the volunteers team helped me to get to know better my

fellow graduate students, and to realize what a good team is all about. I further thank Laura

vi

and Prof. Smith for sharing their life wisdom with me during the rare uneventful breaks at

ISSCC.

With all the intense schedules of the graduate studies, I am grateful to my friends who

helped me discover beautiful places and experience memorable adventures during the short

vacations away from the school matters. I thank Valeri Kirischian and Irina Ivanova for

showing me the beauty of the Province of Ontario through numerous hiking, camping and

canoeing adventures. I am particularly thankful to Valeri and Irina for helping me experi-

ence the wilderness of Lake Temagami, Ontario, with its rapidly changing weather, stren-

uous canoeing and portaging, beaver dams across tiny rivers, camp fires, starry skies, and

sometimes even polar lights. I thank Roman Ochoukov for accompanying me while explor-

ing the cities of the East Coast: Toronto, Boston, New York, Montreal, to name some of

them. I also thank Roman for his moral support during my graduate studies. Whenever I

thought that the graduate life was hard at the University of Toronto, it was enough to chat

with Roman to remind myself that life is even harder at MIT. I thank Kostas Pagiamtzis,

Irene Goldthorpe, Scott McLeod and Kevin Banovic for joining me for a skydiving adven-

ture — my most extreme experience so far. The long journey of the graduate studies differs

on so many levels from a one minute long free-fall. Yet there is one thing is common

between these two experiences: it is less stressful to reflect upon them both in retrospect.

I thank my parents for their unconditional support through the years of my studies.

Last, but not least, I thank my wife, Katya Tyshchenko, for being by my side despite all the

challenges of her own graduate studies. I further thank Katya for being with me through

most of my experiences of the graduate years: from course projects to outdoors adventures.

Approaching the end of my studies, I realize that it is not the end-goal itself that matters,

rather it is the way towards the goal that is important. Meeting the people who helped me,

and discovering the places that inspired me became an invaluable experience for me. After

all, the journey of studies is simply a part of a larger journey of life.

Contents

List of Tables ix

List of Figures x

List of Abbreviations xii

Chapter 1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Design Challenges and Approaches . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 2 Fundamentals of Clock and Data Recovery inHigh-Speed Receivers 6

2.1 Building Blocks of a High-Speed Receiver . . . . . . . . . . . . . . . . . . 62.1.1 Channel Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Signal Energy Considerations . . . . . . . . . . . . . . . . . . . . 15

2.2 CDR Architectures for Binary-Sampling Receivers . . . . . . . . . . . . . 162.2.1 Phase-Tracking CDR Architecture . . . . . . . . . . . . . . . . . . 172.2.2 Oversampling CDR Architecture . . . . . . . . . . . . . . . . . . . 21

2.3 CDR Architectures for ADC-Based Receivers . . . . . . . . . . . . . . . . 232.3.1 Mueller-Muller CDR Architecture . . . . . . . . . . . . . . . . . . 242.3.2 Interpolating Feedback CDR Architecture . . . . . . . . . . . . . . 28

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Chapter 3 An ADC-Based Feed-Forward CDR Architecture 353.1 Feed-Forward CDR Architecture . . . . . . . . . . . . . . . . . . . . . . . 363.2 Phase-Detection Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3 Phase-Recovery Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

vii

Contents viii

3.4 Data-Decision Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Data Retiming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.6 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 523.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 4 A Fractional-Sampling-Rate CDR Architecture 584.1 Fractional-Sampling-Rate CDR Architecture . . . . . . . . . . . . . . . . . 594.2 Phase-Detection Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Eye-Based Phase Detector . . . . . . . . . . . . . . . . . . . . . . 614.2.2 Transition-Based Phase Detector . . . . . . . . . . . . . . . . . . . 64

4.3 Phase-Recovery Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.4 Data-Decision Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.5 Data Compaction Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.5.1 Shift-Register Data Compactor . . . . . . . . . . . . . . . . . . . . 734.5.2 Selector-Array Data Compactor . . . . . . . . . . . . . . . . . . . 74

4.6 Data Retiming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.7 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 774.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Chapter 5 Conclusions 835.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

References 87

List of Tables

2.1 Recently published high-speed receivers. . . . . . . . . . . . . . . . . . . . . . 33

3.1 Jitter tolerance simulation conditions (in Figure 3.14). . . . . . . . . . . . . . . 523.2 Test-chip parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.3 Jitter tolerance measurement and simulation conditions (in Figure 3.17). . . . . 56

4.1 Sampling phases for the sampling rate of 16/

11 ≈ 1.45x. . . . . . . . . . . . . 634.2 Conditional selector truth table. . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3 Test-chip parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

ix

List of Figures

1.1 ITRS projection for chip-to-chip interconnect data rates. . . . . . . . . . . . . 2

2.1 Simplified diagram of an interconnect. . . . . . . . . . . . . . . . . . . . . . . 72.2 Functional block-diagram of a high-speed receiver. . . . . . . . . . . . . . . . 72.3 Channel response in time and frequency domains. . . . . . . . . . . . . . . . . 82.4 Equalization with a filter in the frequency domain. . . . . . . . . . . . . . . . . 102.5 Feed-forward equalization (FFE). . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Decision feedback equalization (DFE). . . . . . . . . . . . . . . . . . . . . . . 122.7 Clocking schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.8 Classification of high-speed receivers with corresponding CDR examples. . . . 152.9 Binary sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.10 Simplified block-diagram of a phase-tracking CDR. . . . . . . . . . . . . . . . 172.11 Phase detection in the phase-tracking CDR. . . . . . . . . . . . . . . . . . . . 182.12 Phase-tracking feedback loop. . . . . . . . . . . . . . . . . . . . . . . . . . . 192.13 Jitter transfer and tolerance of the phase-tracking CDR. . . . . . . . . . . . . . 202.14 Simplified block-diagram of an oversampling CDR. . . . . . . . . . . . . . . . 212.15 Phase detection in the 3x oversampling CDR. . . . . . . . . . . . . . . . . . . 212.16 Jitter tolerance of the 3x oversampling CDR. . . . . . . . . . . . . . . . . . . . 232.17 Sampling with an ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.18 Simplified block-diagram of a Mueller-Muller CDR. . . . . . . . . . . . . . . 242.19 Mueller-Muller timing recovery from an impulse response. . . . . . . . . . . . 252.20 Mueller-Muller timing recovery from continuous data. . . . . . . . . . . . . . 262.21 Jitter tolerance of the Mueller-Muller CDR. . . . . . . . . . . . . . . . . . . . 272.22 Simplified block-diagram of an interpolating feedback CDR. . . . . . . . . . . 282.23 Blind and interpolated samples in the interpolating feedback CDR. . . . . . . . 282.24 Linear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.25 Jitter tolerance of the interpolating feedback CDR. . . . . . . . . . . . . . . . 302.26 Simplified block-diagram of a joint-adaptation-based CDR [43]. . . . . . . . . 31

3.1 Proposed feed-forward CDR architecture (simplified block-diagram). . . . . . 363.2 Receiver with the proposed feed-forward CDR architecture. . . . . . . . . . . . 37

x

List of Figures xi

3.3 Proposed linear phase estimation scheme. . . . . . . . . . . . . . . . . . . . . 383.4 Linear estimation of instantaneous phase, φX . . . . . . . . . . . . . . . . . . . 393.5 Flowchart of 2-bit accurate division for calculating φX . . . . . . . . . . . . . . 403.6 Phase recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.7 Discrete-time integrator with programmable gain. . . . . . . . . . . . . . . . . 423.8 Jitter tolerance dependence on the LPF order (simulated, BER ≤ 5 ·10−6). . . . 433.9 Data decision scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.10 Data decision with isolated pulses. . . . . . . . . . . . . . . . . . . . . . . . . 463.11 Data decision in the interpolating feedback and feed-forward CDRs. . . . . . . 473.12 Data retiming schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.13 Simplified FIFO diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.14 Simulated jitter tolerance (BER ≤ 5 ·10−6). . . . . . . . . . . . . . . . . . . . 523.15 Simplified design flow of the proposed feed-forward CDR. . . . . . . . . . . . 543.16 Test-chip die photograph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.17 Measured jitter tolerance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 Sampling rates in feed-forward CDR architectures. . . . . . . . . . . . . . . . 594.2 Receiver with the proposed fractional-sampling-rate CDR architecture. . . . . . 604.3 Eye diagram accumulation with fractional sampling rate. . . . . . . . . . . . . 624.4 Phase detection from the eye diagram. . . . . . . . . . . . . . . . . . . . . . . 634.5 Simplified block-diagram of the transition-based phase detector. . . . . . . . . 644.6 Selection of transitions leading to low-error phase detection. . . . . . . . . . . 654.7 Average-slope-recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.8 Reduction of phase-detection error using average transition slope. . . . . . . . 674.9 Linear estimation of instantaneous zero-crossing phase, φZC. . . . . . . . . . . 674.10 Selector converting phase values from sampling intervals to unit intervals. . . . 684.11 Phase recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.12 Phase subtracter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.13 Detecting number of samples per UI (jitter-free case). . . . . . . . . . . . . . . 714.14 Data decision in the presence of jitter. . . . . . . . . . . . . . . . . . . . . . . 724.15 Shift-register data compactor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.16 Shift-register data compactor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.17 Simplified FIFO diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.18 Simulated jitter tolerance (BER ≤ 5 ·10−6). . . . . . . . . . . . . . . . . . . . 784.19 Test-chip die photograph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.20 Measured eye diagram at the demux output. . . . . . . . . . . . . . . . . . . . 804.21 Measured jitter tolerance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

List of Abbreviations

ADC Analog-to-digital converter

BER Bit-error rate

CDR Clock and data recovery

CTLE Continuous-time linear equalizer

DAC Digital-to-analog converter

DeMUX De-multiplexor

DFE Decision-feedback equalizer

EQ Equalizer

FFE Feed-forward equalizer

FIFO First-in first-out buffer

FIR Finite impulse response

FSR Fractional sampling rate

HDMI High-definition multimedia interface

IC Integrated circuit

ISI Inter-symbol interference

ITRS International technology roadmap for semicon-

ductors

MLSD Maximum likelihood sequence detection

PCIe Peripheral component interconnect express

PD Phase detector

PI Phase interpolator

RX Receiver

xii

List of Figures xiii

SATA Serial advanced-technology attachment

SI Sampling interval

SNR Signal to noise ratio

TR Timing recovery

TX Transmitter

UI Unit interval

USB Universal serial bus

VCO Voltage-controlled oscillator

Chapter 1

Introduction

H IGH-SPEED SIGNALING SYSTEMS satisfy the growing demand for higher data

rates owing to the ongoing improvements in the integrated circuit (IC) technologies.

The transmission channels, however, have seen little or no improvement over time [1],

which leads to more severe impact of the channel on the data signal with the increasing

data rates. As a consequence, the receivers in the high-speed signaling systems must com-

pensate for the channel response in order to recover the data from the received signal [2,3].

Sampling the signal with an analog-to-digital converter (ADC), instead of a binary sampler,

allows the receivers to compensate for the channel response in the digital domain, which

in turn allows to compensate for more severe channel responses [4, 5]. However, high

ADC power consumption along with high complexity of the clock and data recovery (CDR)

function restrict the use of the ADC-based receivers only to high-performance, rather than

low-cost, interconnects [4–7]. This thesis focuses on the design of low-complexity CDR

architectures that reduce the power consumption of the ADC-based receivers, making them

suitable for the low-cost high-speed interconnects.

1.1 Motivation

The ongoing evolution of the IC technologies enables the processing of increasing volumes

of information in low-cost personal computing and entertainment systems. This trend fuels

the demand for low-cost gigabit-rate interconnects for these data-processing systems. The

International Technology Roadmap for Semiconductors (ITRS) reflects this growing de-

1

1.1. Motivation 2

2008 2010 2012 2014 2016 2018 2020 2022

Year of Production

Data Rate, Gb/s

4

10

100

Figure 1.1: ITRS projection for chip-to-chip interconnect data rates.

mand by projecting an exponential increase of the data rates in the high-speed chip-to-chip

interconnects over the next ten years, as illustrated in Figure 1.1 [8].

The ITRS projections stimulate the development of numerous standards for high-speed

signaling, including the multi-gigabit-rate standards for low-cost computing and entertain-

ment systems. Among the examples of these standards are the commercially-successful

High-Definition Multimedia Interface (HDMI) [9], Peripheral Component Interconnect Ex-

press (PCIe) [10], Serial Advanced-Technology Attachment (SATA) [11] and Universal

Serial Bus (USB) [12]. These standards are commonly used in mobile battery-powered

systems, which restricts the power budget of the high-speed interconnects. Furthermore,

to maintain a low system cost, these standards impose minimum restrictions on the trans-

mission channel between the transmitter and receiver, which leads to the use of low-quality

channels. With the data rates increasing over time, the channel impairments pose one of

the primary challenges in the design of the high-speed transceivers.

The transceivers compensate the signal for the channel distortion at the transmitter

and receiver sides [2, 13], which increases the transceivers’ complexity, area and power

consumption. The challenges of designing low-complexity low-power receivers suitable

for low-cost high-speed interconnects motivate this thesis.

1.2. Design Challenges and Approaches 3

1.2 Design Challenges and Approaches

In low-cost high-speed interconnects, the transmission channel typically consists of a prin-

ted circuit board (PCB) trace, such as in the PCIe standard [10], or a pair of wires of variable

length with little or no shielding, such as in the HDMI, SATA, and USB standards [9,11,12].

These channels modify the transmitted signal by the channel response, and hence make the

task of data recovery from the signal non-trivial at high data rates. Among the numerous as-

pects of the channel response, the inter-symbol interference (ISI) and the timing uncertainty

constitute the primary challenges in the receiver design [14].

The ISI stems from a limited channel bandwidth (BW) compared to the data transfer

rate. Typical channels attenuate the high-frequency content of the signal more than the low-

frequency content. This frequency-dependent attenuation appears as pulse smearing in the

time domain, which causes the adjacent data symbols to superimpose on and interfere with

each other, giving rise to the term inter-symbol interference. Severe ISI causes data errors

at the receiver. To avoid these data errors, the receivers compensate the signal for high-

frequency attenuation using linear equalization or decision-feedback equalization (DFE) [2,

3]. In conventional receivers with binary sampling, this equalization is performed in the

analog domain prior to sampling. The increasing data rates lead to increasing amounts of

ISI, which in turn increases the circuit complexity of the equalizers. The high complexity

of the analog equalizer comes at the cost of high loading at the receiver input, limiting the

data rates at which this receiver can be used.

Sampling the received signal with an analog-to-digital converter (ADC), instead of a

binary sampler, allows to equalize the signal in the digital domain after sampling, thus

avoiding excessive loading at the receiver input. This ADC-based receiver topology is

well-suited for channels with high ISI. However, due to high circuit complexity and power

consumption, the ADC-based receivers find their use either in high-performance optical

interconnects [4, 5, 7] or in hard-drive read channels [15], where the severe ISI justifies

the high area and power of the ADC-based receivers. To a large degree, this high circuit

complexity comes from the clock and data recovery (CDR) system of the receiver, which

allows the receiver to recover the data in the presence of the timing uncertainties.

The timing uncertainty, or jitter, stems from the deviation of the data pulse boundaries

from their nominal time due to random and data-dependent effects [16]. In high-speed in-

1.3. Thesis Contributions 4

terconnects, in order to recover the data, the receiver extracts the timing of the data symbols

from the received signal itself, rather than from a reference clock. This function is com-

monly referred to as clock and data recovery (CDR). The error-free data recovery requires

the receiver to tolerate the jitter. Typically, the CDR relies on a feedback loop to track

the average time of the data pulse boundaries, which allows to compensate for the timing

uncertainties [14]. The high CDR complexity and power consumption prohibit the low-

cost interconnects from using the existing ADC-based receivers. This thesis proposes two

new CDR architectures that reduce the complexity and power of the ADC-based receivers,

making them an attractive option for low-cost high-speed interconnects.

1.3 Thesis Contributions

This thesis investigates techniques of reducing the complexity and power consumption of

high-speed ADC-based receivers through exploring the CDR architectures in the receivers.

This exploration results in two new CDR architectures: a 2x feed-forward CDR architec-

ture and a fractional-sampling-rate (FSR) CDR architecture, which are the two key contri-

butions of this thesis.

The first contribution is a 2x feed-forward CDR architecture [17,18]. This architecture

recovers the data from the digital samples of the signal, taken 2x the baud rate, in a feed-

forward path, eliminating the phase-tracking feedback loop from the CDR. This elimination

of the feedback loop reduces the receiver complexity, which makes the architecture suitable

for low-cost high-speed signaling applications. Test-chip measurement results demonstrate

that the proposed CDR architecture successfully recovers data in a 5 Gb/s receiver.

The feed-forward architecture of the first contribution enables exploring non-conventional

sampling rates to reduce the ADC power consumption, which is one of the main challenges

in the design of ADC-based receivers. The non-conventional sampling rates lead to the

next contribution of this thesis.

The second contribution is a fractional-sampling-rate (FSR) CDR architecture [19].

This architecture recovers the data from the samples taken at a fractional rate between

2x and 1x the baud rate, reducing the ADC power and area compared to the 2x architecture.

Measurements of a test-chip receiver with the FSR CDR confirm a successful data recovery

at 6.875 Gb/s from the samples taken at 10 GS/s, which corresponds to sampling at 1.45x.

1.4. Thesis Outline 5

This sampling rate reduces the ADC power and area by 27.3 % compared to sampling at

2x.

1.4 Thesis Outline

The remainder of this thesis consists of four chapters. Chapter 2 provides a background

for this dissertation through an overview of a sample high-speed signaling system and its

key components. Chapters 3 and 4 present the main contributions of this work. Chapter 3

presents the 2x feed-forward CDR architecture that reduces the CDR complexity compared

to a conventional architecture. This architecture enables the second contribution of this

work. Chapter 4 presents the FSR CDR architecture that reduces the ADC power consump-

tion compared to the 2x architecture. Finally, Chapter 5 concludes this thesis and discusses

potential future research directions in the area of ADC-based receives.

Chapter 2

Fundamentals of Clock and Data Recovery in

High-Speed Receivers

THIS CHAPTER reviews the concept of clock and data recovery (CDR) in high-speed

signaling systems, providing a background for the contributions of this thesis. The

chapter begins with a system-level look at a high-speed interconnect, which reveals that

the channel properties necessitate the two essential building blocks in a receiver: an equal-

izer and a CDR system. Then, the chapter focuses on the CDR systems for two types of

receivers: binary-sampling and ADC-based. For the binary-sampling receivers, a phase-

tracking and an oversampling CDR architectures illustrate the techniques of clock and data

recovery from the binary samples of the received signal. The chapter contrasts these binary-

sampling receivers with ADC-based receivers that use digital samples of the received signal.

For the ADC-based receivers, a Mueller-Muller and an interpolating feedback CDR archi-

tectures exemplify the clock and data recovery techniques. The chapter concludes with a

brief summary.

2.1 Building Blocks of a High-Speed Receiver

To process information, electronic systems require exchanging digital data with other sys-

tems. A computer accessing a peripheral data storage is an example of such data exchange.

Figure 2.1 illustrates a simplified diagram of an interconnect that transfers the digital data

from a source to a consumer: from the computer to the data storage in our example. The

6

2.1. Building Blocks of a High-Speed Receiver 7

Transmitter

(TX)

Receiver

(RX)

Digital Data

to Consumer

Digital Data

from SourceChannel

Interconnect

TXOUT RXIN

Figure 2.1: Simplified diagram of an interconnect.

High-Speed Receiver

Equalizer

(EQ)

RXEQ Clock and Data Recovery

(CDR)

RXIN Digital Data

to Consumer

Figure 2.2: Functional block-diagram of a high-speed receiver.

interconnect consists of a transmitter, a channel and a receiver [20]. The transmitter (TX)

converts the digital data into a form suitable for the channel, T XOUT , and then launches the

data into the channel. The channel is a physical medium that connects the transmitter to the

receiver. This thesis assumes electrical wireline channels, such as a pair of conductors or a

trace on a printed circuit-board (PCB), rather than wireless or optical channels. The chan-

nel modifies T XOUT with unwanted channel response and delivers the modified signal to

the receiver. The receiver (RX) then compensates the received signal, RXIN , for the channel

response and recovers the digital data for the consumer.

Severe impact of the channel compromises the successful data recovery from RXIN ,

which leads to bit-errors in the digital data at the consumer side. The goal of the inter-

connect is to transfer the data with a sufficiently low bit-error rate (BER), typically with

BER < 10−12, while the channel response modifies the transmitted signal [21].

The interconnects, in which the channel, rather than the source or consumer, restricts

the maximum data transmission rate, are commonly referred to as high-speed signaling

systems, or high-speed interconnects. Over the electrical wireline channels, the data rates

are typically restricted to several gigabit per second. This definition of the high-speed

interconnects stems from the fact that the channel properties define the essential functional

blocks required at the transmitter and receiver sides. At the receiver side, which is the focus

of this thesis, the channel necessitates two blocks: an equalizer (EQ) and a CDR system, as

shown in Figure 2.2.

The remainder of this section first takes a closer look at the channel properties in Sec-


TXOUT

f, HzfB/2Channel Gain,dB

0t

RXIN

t

1 UI

1 UI

TCH

(a) Ideal channel: pulse shape preserved, constant channel delay, TCH

f, HzfB/2Channel Gain,dB

0

TXOUT

t

RXIN

t

1 UI

TCH +∆t

Precursor Postcursor

1UI

(b) Non-ideal channel: pulse shape modified, channel delay has uncertainty, or jitter, of ±∆t

Figure 2.3: Channel response in time and frequency domains.

tion 2.1.1, next it reviews the equalization techniques in Section 2.1.2, and finally it intro-

duces the CDR systems in Section 2.1.3 and touches upon the sample energy considerations

in Section 2.1.4.

2.1.1 Channel Properties

Figure 2.3 contrasts an ideal with a non-ideal channel in time and frequency domains. In

the time domain, the ideal channel transfers a data pulse from the TX side to RX side after

a constant delay, TCH , without changing the pulse shape, as illustrated in Figure 2.3(a).

In a binary signaling scheme, this pulse represents a data symbol corresponding to ‘1’,

while a pulse of opposite polarity represents a ‘0’ symbol. To transfer data at 5 Gb/s, or

equivalently at baud rate, fB, of 5 GHz, the transmitter launches the data pulses into the

channel sequentially with the pulse width, or unit interval (UI), of 200 ps. In a channel


with the delay TCH exceeding 1 UI, multiple data symbols are distributed along the channel

length. Since the ideal channel preserves the symbol shape, the symbol reaches the receiver

without interfering with the adjacent symbols. To recover the digital data, the receiver

samples this symbol TCH after the pulse is launched into the channel. In the frequency

domain, the ideal channel has a flat response: it passes all frequency components of the

data pulse equally well.

The non-ideal practical channels have a frequency-dependent attenuation similar to that

of a low-pass filter, and uncertainties in the channel delay, as shown in Figure 2.3(b). Un-

like the low-pass filters that are described in terms of their bandwidth, the channels are

commonly described in terms of their attenuation at fB/2, which is the fundamental fre-

quency when a repeating ‘1010...’ sequence is transmitted through the channel. In the time

domain, this attenuation of the high-frequency content causes the data pulses to change

their shape as they pass through the channel. Figure 2.3(b) symbolically illustrates such

pulse-shape alteration at the RX side: the sharp features of the pulse, corresponding to the

high-frequency content, become smooth and the pulse gets smeared in time, exceeding 1 UI

in duration [21]. With this alteration, the received data symbol, or cursor, is a 1-UI-long

portion of the pulse centered near the maximum amplitude of the modified pulse, while the

remaining parts of the pulse are the pre- and post-cursors of the data symbol. The pre- and

post-cursors superimpose on the surrounding data symbols in the channel, causing inter-

symbol interference (ISI). High channel attenuation at fB/2 leads to severe ISI that causes

data decision errors at the receiver, degrading the BER. To maintain a sufficiently low BER,

the receiver compensates the signal for the ISI using an equalizer [2, 3], which is the first

block of a high-speed receiver. Section 2.1.2 reviews linear and non-linear equalization

techniques.

In addition to ISI, the channel delay, TCH , varies from channel to channel by±∆t, which

may exceed 1 UI. Furthermore, the temperature and process increase ∆t variation. These

deviations of the channel delay from the nominal value contribute to the timing uncertain-

ties, or jitter, at the receiver. The jitter is further exacerbated by random and data-dependent

processes in all parts of the interconnect: the transmitter, channel and receiver. As a conse-

quence, at the receiver side, it is unknown a priori at what time the data symbols and the

boundaries between them arrive. Knowing the timing of the data symbols is essential for

the receiver to recover data with low BER. In order to compensate for jitter, the receiver


f, HzfB/2

Gain,dB 0

f, HzfB/2

Gain,dB 0

Channel Equalizer

TXOUTRXIN

RXEQ

f, HzfB/2

Gain,dB 0

Channel + Equalizer

TXOUT RXEQ

Figure 2.4: Equalization with a filter in the frequency domain.

relies on a clock and data recovery (CDR) system [14], which is the second block of a

high-speed receiver. Section 2.1.3 overviews the concept of clock and data recovery, and

Sections 2.2–2.3 discuss the CDR schemes for binary-sampling and ADC-based receivers.

2.1.2 Equalization

The goal of equalization is to compensate the received signal for the channel-induced ISI,

thus preventing bit-errors at the receiver [21,22]. The equalizers are divided into linear and

non-linear. The linear equalizers typically boost the high-frequency content of the signal

using linear operations. In contract, the non-linear equalizers rely on non-linear operations

to estimate the ISI in order to subtract it from the signal. This section reviews first the linear

equalization techniques and then the non-linear equalization.

The linear equalization can be used at the receiver side [23–25], transmitter side [26–

28] or both sides simultaneously [29–31]. Figure 2.4 demonstrates the effect of linear

equalization in frequency domain through an example of the receiver side equalizer. The

equalizer is a filter with a gain peaking around fB/2, such that a cascade of the channel

with the equalizer has a flat response up to fB/2. This cascade prevents the attenuation of

the high-frequency component in the equalized signal, RXEQ, and thus reduces the channel-

induced ISI.


RXEQRXIN

TDLY α

t

RXIN(t)

t

RXEQ(t)

t

α·RXIN(t–TDLY)

Figure 2.5: Feed-forward equalization (FFE).

The linear equalizer can be implemented either as a continuous-time or a discrete-time

filter. The continuous-time linear equalizer (CTLE) is typically an amplifier with some gain

peaking near fB/2 [32]. In contrast, the discrete-time equalizer is a finite-impulse-response

(FIR) filter. Figure 2.5 illustrates a sample equalizer FIR, where the equalizer estimates the

ISI through delaying the received signal by TDLY and scaling down the delayed signal by

a tap weight, α . The equalizer then subtracts this estimated ISI from RXIN to obtain the

equalized signal, RXEQ. Since the ISI estimate is fed forward, this equalizer topology is

called a feed-forward equalizer (FFE) [21, 33].

The amount of gain peaking in CTLE and the ISI tap weights in FFE can be either

constant or adaptive. If the channel characteristic is known at the time of the equalizer

design, constant equalizer settings are sufficient. However, a variable channel response

necessitates adaptive equalization where the equalizer settings are adjusted to the channel

properties.

The transmitter-side equalization is commonly referred to as pre-emphasis. The pre-

emphasis is similar in principle to the receiver-side equalization with the only difference

that the equalizer precedes the channel. Pre-emphasis boosts the high-frequency content of

the signal with a CTLE or FFE before the signal is launched into the channel. Since the

channel response is impossible to estimate at the transmitter side, the pre-emphasis either

uses constant equalizer settings, or it requires a return channel such that the receiver can

feed the adaptation information to the transmitter equalizer.

Both the continuous-time and discrete-time linear equalizers suffer from noise amplifi-

cation [21]. While boosting the high-frequency content of the signal, the CTLE also boosts

up the high-frequency component of the noise in the received signal. Similarly, the ISI

estimate in the FFE contains the noise components. As a consequence, the linear equal-

izers exacerbate the noise in the equalized signal. To prevent this noise amplification, the


RXIN

TDLY=1UI

α1

α2

TDLY=1UI

DTi-1

DTi-2

RXEQDTi

Figure 2.6: Decision feedback equalization (DFE).

receivers rely on non-linear equalizers.

Figure 2.6 shows through an example the concept of decision feedback equalization

(DFE), which is a non-linear equalization technique. Similar to FFE, the DFE subtracts an

estimate of ISI from the received signal, RXIN , to get the equalized signal, RXEQ. However,

in contrast with FFE, the DFE estimates the ISI by feeding back the data decision bits DTi,

which are obtained through a non-linear slicing operation [21, 22]. This non-linear slicing

prevents the noise from affecting the estimated ISI, and therefore the DFE does not amplify

the noise while cancelling the ISI. The example shown in Figure 2.6 illustrates a 2-tap DFE.

First, the decision bits, DTi, pass through a chain of 1-UI-long delay elements, TDLY , to

generate a set of previous decision bits, DTi−1 and DTi−2. Then, the previous decision bits

are scaled by their corresponding tap weights, α1 and α2, to estimate the post-cursor ISI

contribution due to these previous bits. Finally, the DFE feeds back the ISI estimate and

subtracts it from RXIN to get RXEQ. Since the DFE relies on the previous decision bits to

estimate the ISI contribution, the DFE can cancel post-cursor ISI only. The number of DFE

taps depends on the severity of ISI in the channel. In most cases, the DFE tap weight are

adaptable to the channel response.

The DFE successfully compensates for ISI under the condition that most of the data

decisions are correct. The data errors lead to the incorrect estimation of the ISI, which

leads to further errors [22]. Since typically the high-speed interconnects operate with BER

< 10−12, the DFE is an effective way of compensating for the ISI without amplifying the

noise.

The common property of the linear equalization and DFE is that both these approaches

cancel the ISI energy superimposed on the cursor energy, thus reducing the total signal

energy that reaches the decision circuit. This ISI cancellation allows to use low-complexity

and low-cost decision circuits that recover a single bit at a time. In contrast with a single-bit


detection, sequence detection approach reuses the ISI energy in order to recover a sequence

of bits at a time. Viterbi algorithm is an example of maximum likelihood sequence detection

(MLSD) algorithms, which is widely used in communication applications [4, 21]. The

MLSD-based receivers successfully compensate for high amounts of ISI in the received

signal at the cost of high circuit complexity. The MLSD algorithms are computationally

intensive, which leads to high receiver power consumption. As a result, the MLSD receivers

are typically used in high-performance applications or for channels with severe ISI such as

read channels in disk drives.

Since the equalizers modify the amplitude of the received signal, the position of the

equalizer with respect to the sampler depends on the sampling type in the receiver. Sec-

tion 2.2 shows that in a binary-sampling receiver, the equalizer must precede the sampler

in the analog domain, while Section 2.3 shows that in an ADC-based receiver the equalizer

can be implemented after the sampler in the digital domain. Before delving into the details

of sampling in the receivers in Sections 2.2 and 2.3, the following section reviews the basics

of clock and data recovery in a high-speed receiver.

2.1.3 Clock and Data Recovery

The role of the clock and data recovery (CDR) system in a high-speed receiver is to extract

the symbol timing from the received signal and then to use this timing for the data recovery

in the presence of timing uncertainties, or jitter, in the received signal. The magnitude of

the timing uncertainties compared to the UI determines the type of a clocking scheme and

the necessity for the CDR in an interconnect.

Figure 2.7 compares three clocking schemes at a system level. A short channel delay

compared to the UI allows for a global clocking scheme, show in Figure 2.7(a), in which

a shared clock generator distributes a global clock to the two systems that are exchanging

data. This global clock synchronizes the transmitter with the receiver in every interconnect,

thus serving as a timing reference. With the increasing data rate and shrinking UI, the propa-

gation delay through the clock path becomes comparable to the UI, which makes the global

clocking scheme non-suitable for the high-speed interconnects. To align the propagation

delays in the clock and data paths, a source-synchronous clocking scheme, illustrated in

Figure 2.7(b), delivers a reference clock from the transmitter to receiver through a replica

of the data channel [34]. This scheme tolerates timing uncertainties of larger amplitude


Transmitter ReceiverData

Data TransmitterReceiver

System 1 System 2Shared Clock

(a) Global scheme

Transmitter ReceiverData

Clock

Data

ClockTransmitterReceiver

System 1 System 2

(b) Source-synchronous scheme

Transmitter Data

Data Transmitter

System 1 System 2

DT

CLKCDR

DT

CLKCDR

(c) Clock and data recovery (CDR) scheme

Figure 2.7: Clocking schemes.

compared the UI at the price of doubling the number of channels per interconnect, thus

increasing the overall interconnect cost. To reduce this overhead, a CDR-based clocking

scheme, shown in Figure 2.7(c), first recovers a reference clock, CLK, from the received

data signal, and then uses this clock to recover the data, DT [14,21,22]. The CDR relies on

the clock embedded in the actual data stream in the form of data transitions. Compared to

the source-synchronous scheme, the CDR-based clocking scheme in addition to tolerating

higher jitter amplitudes eliminates the need for a dedicated clock channel, thus reducing the

channel cost. Since the channel constitutes one of the dominant costs in the interconnect,

a large majority of the low-cost high-speed interconnects uses the CDR-based clocking

scheme.

The choice of the CDR scheme for an interconnect depends on the receiver type. Fig-


High-Speed

Receivers

Binary-Sampling ADC-Based

Phase-Tracking Blind-Sampling Phase-Tracking Blind-Sampling

Phase-Tracking CDR Oversampling CDR Mueller-Müller CDRInterpolating

Feedback CDR

(2x) (> 2x) (1x) (2x or less)

Figure 2.8: Classification of high-speed receivers with corresponding CDR examples.

ure 2.8 classifies the high-speed receivers based on the sampling circuit and on the clock

synchronization. The binary-sampling receivers sample the signal with a binary sampling

circuit, such as a flip-flop, while the ADC-based receivers sample with an analog-to-digital

converter (ADC), which is a multi-level sampling circuit. Both the binary-sampling and

ADC-based receivers are further classified into the phase-tracking and blind-sampling cat-

egories. The phase-tracking receivers align a local clock with the received signal using

a phase-tracking feedback loop in order to synchronize the signal samples with the data

symbols. The blind-sampling receivers, in contrast, sample the signal with a clock that

is free-running, or blind, with respect to the received symbol boundaries. Figure 2.8 lists

four CDR examples corresponding to the four categories of the high-speed receivers. The

figure also annotates the sampling rates for every receiver category. A phase-tracking [35]

and an oversampling CDRs [36, 37] illustrate the clock and data recovery in the binary-

sampling receivers in Section 2.2, while a Mueller-Muller [38] and an interpolating feed-

back CDRs [39, 40] serve as samples for the ADC-based receivers in Section 2.3.

2.1.4 Signal Energy Considerations

The sampling scheme in a receiver determines the amount of signal energy captured in the

samples in every UI. The signal energy per UI depends on two parameters: the number of

samples per UI and the position of the samples with respect to the UI boundaries. The exact

sample energy also depends on the received data pattern. However, a repeating ‘1010...’ se-

quence, which can be approximated by a sinusoid, typically corresponds to the worst-case

2.2. CDR Architectures for Binary-Sampling Receivers 16

RXIN

Binary

Samples1 1 0 0 0 0 1 1 1

Figure 2.9: Binary sampling.

signal energy when the signal passes through a band-limited channel. This approximation

of the received signal with a sinusoid simplifies the estimation of the worst-case sample

energy for a given sampling scheme. The sample energy is proportional to the signal am-

plitude squared at the sampling instance.

With these simplifications, it is possible to compare the sampling schemes based on

the sample energy per UI. As an example, 2x phase-tracking sampling (with one sample

aligned with the UI center and the other sample aligned with the UI boundary) yields the

same sample energy per UI as the baud-rate phase-tracking sampling (with the sample

aligned with the UI center). This energy equivalence stems from the fact that the sample

aligned with the UI boundary yields zero signal energy. Furthermore, it is possible to show

that 2x blind sampling yields the same sample energy per UI as baud-rate and 2x phase-

tracking sampling schemes.

The signal energy per UI can be used to estimate the signal to noise ratio (SNR) in

the samples. SNR is commonly used to characterize analog circuits, such as amplifiers,

filters and analog-to-digital converters. However, this metric is not common in CDR sys-

tems. This thesis will briefly mention the signal energy per UI in the context of fractional

sampling rate CDRs in Chapter 4.

2.2 CDR Architectures for Binary-Sampling Receivers

A binary-sampling receiver takes binary samples of the received signal, preserving the sign

of the signal at the sampling instances and discarding the signal amplitude, as illustrated in

Figure 2.9. A relatively simple circuit, such as a flip-flop, is sufficient for this binary sam-

pling [35], which makes the binary-sampling receivers an attractive option for low-power

applications. Section 2.1.2 demonstrated that the equalization requires access to the ampli-

tude of the received signal. Since the binary sampler preserves only the sign of the signal,


EQBinary Sampler

PD Loop Filter

VCO

RXIN

1DTREC

2

CKREC VCTRL

φERR

Figure 2.10: Simplified block-diagram of a phase-tracking CDR.

the equalizer precedes the sampler in the binary-sampling receivers. As a consequence, all

the signal equalization is performed in the analog domain, with the equalizer loading the

high-speed input node of the receiver. This input loading, in turn, limits the amount of the

ISI compensation that can be practically implemented in an integrated receiver. In addition,

the analog circuits of the equalizer scale poorly with the IC technology scaling.

The binary nature of sampling requires the receiver to rely only on the signal signs to

recover the clock and data. The remainder of this section reviews the CDR architectures for

the binary-sampling receivers through two examples: a conventional phase-tracking CDR

in Section 2.2.1, and an oversampling CDR in Section 2.2.2.

2.2.1 Phase-Tracking CDR Architecture

Figure 2.10 shows a simplified block-diagram of a phase-tracking CDR [35]. First, an

equalizer compensates the received signal, RXIN , for the channel ISI. Then, a recovered

clock, CKREC, triggers a binary sampler to take two samples of the signal in every UI such

that one of these samples is aligned with the UI center, while the other – with the UI edge.

Next, a phase detector (PD) uses these binary samples to estimate the phase error, φERR,

which a difference between the phase of the received data and the phase of the recovered

clock. A low-pass loop filter averages φERR to generate a control voltage, VCT RL, for a

voltage-controlled oscillator (VCO). Finally, the VCO adjust the phase of CKREC to align

it with the received data phase, thus closing the phase-tracking feedback loop. Since the

samples are aligned with the UIs, the sample in the UI center becomes the recovered data,

DTREC. The two essential components of this CDR, the phase detector and the phase-

tracking loop, are discussed in greater details next.

Figure 2.11 demonstrates the phase detection algorithm in a binary-sampling phase-


Ei

Ei+1 Ei+2Di+1 Di+2

Di Ei+3Di+3 Di+4

RXIN

CKREC

DiEiDi+1 = 110 or 001

CKREC is early

(a) Clock early

0 0 1 11

Ei Ei+1

Ei+2

Di+1 Di+2

Di Ei+3Di+3 Di+4

RXIN

CKREC

DiEiDi+1 = 100 or 011

CKREC is late

(b) Clock late

Figure 2.11: Phase detection in the phase-tracking CDR.

tracking CDR. The PD uses the transitions between the distinct data symbols in RXIN (‘1’ to

‘0’ or ‘0’ to ‘1’) in order to detect if the recovered clock, CKREC, is early or late compared to

RXIN . The PD uses the samples taken twice per UI: at the rising and falling edges of CKREC.

The samples taken at the rising edge, Di, are close to the UI center, while the samples taken

at the falling edge, Ei, are close to the UI edge. When two consecutive UI-center samples

are distinct, as in the example of Di and Di+1, the PD compares these samples with the

UI-edge sample between them, Ei. If Di and Ei are identical, the PD indicates that CKREC

is early compared to RXIN , as illustrated in Figure 2.11(a). Conversely, if Ei and Di+1

are identical, the PD indicates that CKREC is late, as shown in Figure 2.11(b). When two

consecutive UI-center samples are identical, as in the example of Di+1 and Di+2, the PD

holds its previous output. Since this type of PD detects only the sign of the phase error in a

non-linear binary manner (clock early or late), and not the magnitude of the phase error, it’s

commonly referred to as bang-bang type. To align CKREC with RXIN using this bang-bang

PD, the CDR relies on a phase-tracking feedback loop, which effectively averages the PD

characteristic, allowing to approximate the PD and the entire loop with a linear model.

Figure 2.12 presents a linearized signal flow diagram of the phase-tracking loop [14].

The input to the system is the phase of the received data, φIN(s), and the output is the

phase of the recovered clock, φREC(s). In this diagram, a subtracter followed by a gain,

KPD, approximates the sampler and the PD; the loop filter has a low-pass transfer function


KVCOs

KPD H(s)φIN(s)φERR(s)

φREC(s) VCTRL(s)

Sampler and PD

Figure 2.12: Phase-tracking feedback loop.

H(s); and KVCO/s models the VCO. Using this negative feedback loop, φREC(s) tracks the

low-frequency jitter in φIN(s), and attenuates the high-frequency jitter. First, the subtracter

detects the phase error, φERR(s), between the input and output phases. Then, the low-pass

filter averages φERR(s) into the control voltage, VCT RL(s), for the VCO. Finally, the VCO

adjusts φREC(s) by changing its oscillation frequency, closing the feedback loop. The trans-

fer function of this phase-tracking loop is commonly referred to as jitter transfer function,

and it can be written as:φREC(s)φIN(s)

=KPDKVCOH(s)

s+KPDKVCOH(s). (2.1)

Typically, this jitter transfer is a low-pass transfer function.

Since the recovered clock triggers the sampler to take a sample at the UI center for the

data recovery, an error-free data recovery requires φREC(s) to closely follow φIN(s). In fact,

the CDR makes a data decision error when φREC(s) deviates from φIN(s) by 0.5 UI or more

in either direction. Equivalently, the condition for the error-free data recovery is:

|φREC(s)−φIN(s)|< 0.5UI. (2.2)

This condition, in combination with (2.1), allows to evaluate the maximum amplitude of

the sinusoidal data jitter that the CDR can tolerate (in UI):

|φIN(s)|< 0.5∣∣∣∣1+

KPDKVCOH(s)s

∣∣∣∣ . (2.3)

Expressed as a peak-to-peak value, UIPP, this limit is commonly referred to as jitter toler-

ance:

JitTol(s) =∣∣∣∣1+

KPDKVCOH(s)s

∣∣∣∣ , (2.4)

which is the maximum sinusoidal jitter that the CDR can tolerate at a given frequency

without making data decision errors.


Jitter frequencyω0

φREC(s)/φ

IN(s),dB

0

(a) Jitter transfer

Jitter frequencyω0

Jitter amplitude,UI PP

1

(b) Jitter tolerance

Figure 2.13: Jitter transfer and tolerance of the phase-tracking CDR.

Figure 2.13 symbolically illustrates the jitter transfer and jitter tolerance of the conven-

tional phase-tracking CDR. The jitter transfer in Figure 2.13(a) reflects that the CDR tracks

the low-frequency input jitter as long as the jitter frequency remains below ω0, which is

the bandwidth of (2.1); and that the CDR attenuates the high-frequency jitter exceeding

ω0. The jitter tolerance in Figure 2.13(b) shows that the CDR tolerates over 1 UIPP of low-

frequency jitter below ω0, and up to 1 UIPP of high-frequency jitter above ω0. A number

of standards for high-speed interconnects include specifications for the jitter transfer and

jitter tolerance since they are convenient means for quantifying the CDR’s ability to recover

data with a low BER in the presence of jitter. Unlike the analytical jitter tolerance that re-

flects only the system-level limitations of the CDR, the simulated and measured tolerances

also reveal the amount of the jitter tolerance reduction due to the circuit implementation

of the CDR. The simulated and measured jitter tolerances are used to validate the CDR

architectures proposed in this thesis.

The phase-tracking CDR architecture is typically implemented in the analog domain:

high-speed flip-flops followed by a charge pump implement the PD, while an RC network

implements the loop filter. As a consequence, this architecture is challenging to scale be-

tween the IC technology nodes, and therefore it takes only a partial advantage of the IC tech-

nology scaling. The oversampling CDR architecture, in contrast, is implemented entirely

in the digital domain, taking the full advantage of the technology scaling. The following

subsection takes a closer look at the oversampling CDR.


EQBinary Sampler

PDBlind Sampling Clock

RXIN 1DTREC

N>2N:1

φPICK

N phases

Figure 2.14: Simplified block-diagram of an oversampling CDR.

RXIN

CK φ=0

CK φ=1/3

CK φ=2/3

Di+1

Di

Di+2

Di+3 Di+4

φPICK=0

φ=1/3φ=2/3

data zero-crossing phase

1/3 ≤ φ ≤ 2/3

Figure 2.15: Phase detection in the 3x oversampling CDR.

2.2.2 Oversampling CDR Architecture

Figure 2.14 shows a simplified block-diagram of an oversampling CDR, which is a blind

binary-sampling CDR [36, 37]. First, a multi-phase sampling clock triggers the binary

sampler to oversample the equalized received signal by a factor of N above the data rate,

i.e. to take N samples per UI. Since the sampling instances have no phase relation to the

received signal, RXIN , this type of sampling is referred to as blind sampling. Then, out

of N clock phases, the PD identifies the data-picking phase, φPICK , which is the closest

phase to the UI center. Finally, an N-to-1 selector takes the sample corresponding to φPICK

as the recovered data, DTREC. Since there is no feedback in this CDR, it is also called a

feed-forward CDR. The key component in this CDR architecture is the PD.

Figure 2.15 demonstrates the phase detection algorithm through a 3x oversampling

example, i.e. N = 3. A three-phase blind sampling clock, CK, with phases 0, 1/3 and

2/3 UI triggers the sampler at the rising edge of each phase to take the total of 3 samples

per UI. The PD then performs the XOR function on each pair of adjacent samples to find

the two clock phases closest to the data symbol boundaries. In this example, the symbol


boundaries occur between 1/3 and 2/3 UI phases of the sampling clock. The binary nature

of sampling allows to detect only the range, rather than the exact value, of the data zero-

crossing phase. A circular phase diagram in Figure 2.15 highlights this range with a shaded

sector between φ = 1/3 and φ = 2/3. The PD identifies the most distant phase from the

zero-crossing range as the data-picking phase, φPICK , since this phase is the closest to the

UI center. In our example, φPICK = 0, and the samples taken at the 0 UI phase of the

clock become the recovered data, Di (highlighted with circles in the figure). This data-

picking scheme results in a correct decision as long as both zero-crossings surrounding

φPICK occur within the shaded sector in the circular diagram of Figure 2.15. A deviation

of two consecutive zero-crossings by the total of 1/3 UI, or cycle-to-cycle jitter of 1/3 UI,

leads to a data-decision error. This observation allows to estimate the high-frequency jitter

tolerance of a 3x oversampling CDR.

Since the sampling clock is free-running with respect to the data in the oversampling

CDR, some phase and frequency offsets between the received data and the sampling clock

are inevitable. The blind-sampling CDRs compensate for these phase and frequency offsets

using a FIFO buffer and a data flow-control technique. A FIFO buffer retimes the recov-

ered data from the transmission rate to the consumption rate by the data consumer at the

output of the interconnect. This technique is limited by the FIFO size and it is suitable to

compensate primarily for the phase offsets. The data flow-control technique compensates

for small frequency offsets between the transmitter and receiver by means of negotiating

the data flow rate from the data source to consumer. The source adjusts the rate at which

the information bits are sent over the interconnect by inserting the padding bits into the

data stream. The padding bits carry no information and they only serve the purpose of com-

pensating for the frequency offsets. The data consumer then eliminates these padding bits

(if necessary) from the recovered data. The flow-control is typically implemented at the

level of the data source and consumer, and it does not affect the interconnect or the CDR

architecture.

Figure 2.16 shows the ideal jitter tolerance of a 3x oversampling CDR. The jitter tol-

erance is limited at low and high frequencies by two distinct effects. At low frequencies,

below ω1, the FIFO restricts the maximum jitter tolerance to the FIFO size, while at high

frequencies, above ω2, the phase detection and data decision schemes restrict the jitter tol-

erance to 1/3 UIPP. The values of ω1 and ω2 as well as the jitter tolerance between these

2.3. CDR Architectures for ADC-Based Receivers 23

Jitter frequencyω2


1/3

FIFOSize

ω1

Figure 2.16: Jitter tolerance of the 3x oversampling CDR.

RXIN

ADC

Samples1 0.6 0 0 0 0.4 1 1 1

Figure 2.17: Sampling with an ADC.

two frequencies depend on the implementation of the phase detector. This dependence is

thoroughly analyzed in [41].

The blind oversampling CDR architecture is typically implemented entirely in the dig-

ital domain, which allows this architecture to take advantage of the IC technology scaling.

This blind-sampling CDR architecture comes at the cost of reduced jitter tolerance at high

frequency compared to the phase-tracking CDR. Moreover, the oversampling requires a

multi-phase clock generation and distribution scheme which increases the circuit complex-

ity.

Similar to the conventional phase-tracking CDR, the oversampling CDR equalizes the

received signal in the analog domain prior to sampling. This analog equalization limits

the amount of channel compensation that can be practically implemented in an integrated

receiver. Sampling the received signal with an ADC, instead of a binary sampler, allows

for the equalization in the digital domain after sampling. The following section is devoted

to the CDR architectures for the ADC-based receivers.

2.3 CDR Architectures for ADC-Based Receivers

An ADC-based receiver samples the received signal with an ADC, preserving both the sign

and amplitude of the signal at the sampling instances, as shown in Figure 2.17 [38–40].


EQADC PD Loop Filter

VCO / PI

RXIN1

DTREC

1

CKREC VCTRL

φERR

DACφAVG

DataDecision

1

Figure 2.18: Simplified block-diagram of a Mueller-Muller CDR.

The signal amplitude captured in the samples allows for the integration of extensive digital

signal processing (DSP) into the receiver to compensate for high channel distortion after

the signal is sampled [4, 5]. Furthermore, the equalization in the digital domain simplifies

the circuit design and takes the full advantage of the IC technology scaling [7]. At the time

of this thesis writing, these benefits of the ADC-based receivers come at the cost of higher

power consumption by the ADC sampler compared to the binary-sampling flip-flops.

The digital samples capture more information about the signal at the sampling instances

than the binary samples. This extra information in the samples allows the ADC-based

receivers to recover the clock and data either using the phase-tracking sampling at 1x the

baud rate or using the blind sampling at 2x the baud rate. The remainder of this section

reviews the CDR architectures for the ADC-based receivers through two examples: a phase-

tracking Mueller-Muller CDR in Section 2.3.1, and a blind-sampling interpolating feedback

CDR in Section 2.3.2.

2.3.1 Mueller-Muller CDR Architecture

Figure 2.18 shows a simplified block-diagram of the Mueller-Muller CDR [4–6, 38]. Sim-

ilar to the conventional phase-tracking CDR, the Mueller-Muller architecture relies on a

phase-tracking feedback loop to align the recovered clock, CKREC, with the data symbols

in the received signal, RXIN . First, CKREC triggers the ADC to sample RXIN once per UI,

i.e. at 1x the baud rate, such that the samples are close to the UI centers. Then a digital

equalizer compensates the signal samples for the channel ISI. Next, the PD generates the

phase error, φERR, which indicates the deviation of the sampling instances from the UI cen-

ters. A digital loop filter then averages φERR to obtain the average error between the data


1UI 1UI

h0

h1h–1

t

(a) τA = h−1−h1

1UI 1 UI

h0

h1

h–1 t

(b) τB = h−1

Figure 2.19: Mueller-Muller timing recovery from an impulse response.

and sampling phases, φAV G. A digital-to-analog converter (DAC) converts φAV G into the

analog control voltage, VCT RL, for the VCO or the phase interpolator (PI). Finally, the VCO

or PI adjusts the phase of CKREC to align it with the UI centers in RXIN , thus closing the

feedback loop. Since the samples are at the UI centers, the data decision algorithm uses

these samples to output the recovered data, DTREC. This feedback loop can be analyzed at

the system level similar to that of the conventional phase-tracking CDR. The phase detec-

tion approach, however, significantly differs from the bang-bang PD. The Mueller-Muller

phase detection scheme is reviewed next.

The PD extracts the timing information from the digital baud-rate samples of the re-

ceived signal to drive the phase-tracking loop [38]. This timing recovery from the baud-

rate samples is illustrated in two steps. First, the timing recovery scheme from a channel

impulse response is shown; second, this scheme is extended to a continuous data stream.

Figure 2.19 illustrates the timing recovery from an impulse response. Some small amount

of ISI in the form of pre-cursor, h−1, and post-cursor, h1, is vital for the Mueller-Muller tim-

ing recovery scheme. In case of a symmetric ISI around the symbol cursor, h0, shown in

Figure 2.19(a), the sampling phase that makes the pre- and post-cursors equal, i.e. h−1 = h1,

places h0 close to the maximum of the impulse response, which is the desired sampling

phase. Hence, a function

τA = h−1−h1 (2.5)

indicates if the sampling phase is early or late for the optimum sampling, and it can guide

the timing recovery. However, replacing this symmetric-ISI channel with an asymmetric-

ISI channel, shown in Figure 2.19(b), prevents the timing function τA in (2.5) from correctly

detecting the data phase, and the phase tracking loop from converging. This asymmetric-


t

yn = –0.5

ŷn = –1

yn-1 = 0.75

ŷn-1 = 11 UI

(a) Early sampling: τA = (0.75×−1)− (−0.5×1) =−0.25

yn = –1

ŷn = –1

yn-1 = 1

ŷn-1 = 1

1 UI

t

(b) Correct sampling: τA = (1×−1)− (−1×1) = 0

yn = –0.75

ŷn = –1

yn-1 = 0.5

ŷn-1 = 1

1UI

t

(c) Late sampling: τA = (0.5×−1)− (−0.75×1) = 0.25

Figure 2.20: Mueller-Muller timing recovery from continuous data.

ISI channel requires a different timing function for a successful timing recovery. In the

example of Figure 2.19(b), a sampling phase that makes h−1 = 0 places the cursor sample,

h0, close to the maximum of the impulse response. Therefore, a function

τB = h−1 (2.6)

allows to recover the signal timing in case of an asymmetric impulse response. These two

functions, τA and τB, illustrate that for a successful timing recovery from the baud-rate

samples the timing function needs to match channel impulse response.

As the second step of illustrating the Mueller-Muller timing recovery scheme, it can

be shown that a simple operation can estimate the timing function from a continuous data

stream in the average sense [38]. For instance, the timing function τA in (2.5) can be

estimated using

τA = (yi−1 · yi)− (yi · yi−1) , (2.7)


Jitter frequencyω0


1

Figure 2.21: Jitter tolerance of the Mueller-Muller CDR.

where yi and yi−1 are the signal samples, while yi and yi−1 are the corresponding decision

bits. Figure 2.20 illustrates this timing recovery through a simplified example of two con-

secutive pulses. The pulses exceed 1 UI in duration and therefore they interfere with each

other forming some ISI. In this example, the timing function, τA, becomes negative, zero

or positive depending on the alignment between the sampling instances and the UI centers.

In a similar way, other timing functions can be evaluated in the average sense from the

samples of the continuous data stream.

Since in the Mueller-Muller CDR the samples are aligned with the UI centers and the

CDR adjusts its sampling frequency to the received data rate, the jitter tolerance of the

Mueller-Muller CDR is similar to that of a phase-tracking CDR. For convenience, this

jitter tolerance is repeated in Figure 2.21. At low frequency, the jitter tolerance of the

Mueller-Muller CDR is determined by the bandwidth of the phase-tracking loop, while at

high frequency the jitter tolerance is limited to 1 UIPP.

The Mueller-Muller CDR architecture is well suited for the high-performance intercon-

nects. The digital equalizer implementation satisfies the need for extensive compensation

for the channel ISI at high data rates. Typically, the standards for the high-performance

interconnects have a well defined channel response, which allows using a single timing

function in the Mueller-Muller CDR for the given interconnect standard. The low-cost

interconnects, in contrast, impose fewer restrictions on the channel characteristics. Some

standards, such as USB, SATA and HDMI [9,11,12], allow the end-user to pick the channel,

which can range in length from several centimeters to several meters. Such a wide range

of channel characteristics within a single standard makes it challenging to use the Mueller-

Muller CDR architecture for the low-cost interconnects since a single timing function is

unlikely to suite the entire range of possible channel responses. The high complexity of the


EQADC PD Loop FilterRXIN

2

DTREC

2 φERR

φAVG

DataDecision

1

Blind Sampling Clock

Interpolator2

InterpolationIndex Updater

µ

Figure 2.22: Simplified block-diagram of an interpolating feedback CDR.

Si+1

Si+3Si+5

Si+2 Si+4

Si Si+7Si+6 Si+8

RXIN

Interpolated

Samples

Ii+1

Ii+3

Ii+5

Ii+2 Ii+4

IiIi+7Ii+6

Figure 2.23: Blind and interpolated samples in the interpolating feedback CDR.

phase-tracking loop that spans across the analog and digital domain boundaries (see Fig-

ure 2.18) contributes to the design and verification costs of the receivers with the Mueller-

Muller CDR. This poses further challenges in adopting the Mueller-Muller architecture for

the low-cost interconnects. The interpolating feedback CDR architecture, which is the sub-

ject of the following subsection, is better suited for the low-cost interconnect since it has a

low sensitivity to the channel response and offers an all-digital circuit implementation.

2.3.2 Interpolating Feedback CDR Architecture

Figure 2.22 shows a simplified block-diagram of the interpolating feedback CDR [39, 40].

This CDR samples the received signal blindly, and then emulates the phase-tracking in the

digital domain using interpolation. The blind nature of sampling rules out the sampling at

1x the baud rate since in the worst case, the baud-rate samples might fall on the UI edges,

which makes the error-free data recovery practically impossible. Hence, the interpolating

feedback CDRs typically sample the received signal at 2x. A digital equalizer then com-

pensates the signal samples for the channel ISI. To recover the clock and data, the CDR


Si+1

Si

µ

Ii

-3

5

0.6 t

Ii = (1–µ)Si + µSi+1

Si = -3

Si+1=5

µ=0.45

Ii=0.6

Figure 2.24: Linear interpolation.

interpolates between the blind samples, Si, a new set of samples, Ii, as illustrated in Fig-

ure 2.23. Every UI has two interpolated samples, Ii, such that one sample is close to the UI

center, while the other sample is close to the UI edge. The PD then uses the interpolated

UI-edge samples to detect the phase error, φERR, which is the deviation of the interpolated

samples from the symbol boundaries in RXIN (see Figure 2.22). A digital loop filter then

recovers the average zero-crossing phase φAV G from φERR. An interpolation index updater

converts the recovered φAV G into an interpolation index, µ . This index adjusts the position

of the interpolated samples with respect to the blind samples to align Ii with the UI bound-

aries, thus closing the digital feedback loop. The data decision block uses the interpolated

UI-center samples to generate the recovered data, DTREC. The interpolator in the feedback

loop enables this CDR architecture to emulate the phase-tracking entirely in the digital

domain. The interpolation operation is briefly discussed next.

Figure 2.24 illustrates function of the interpolator through an example of the first order,

i.e. linear, interpolation [39]. The inputs to the interpolator are two blind signal samples,

Si and Si+1, and the interpolation index, µ , which ranges from 0 to 1. The interpolator

joins the two samples with a line, and outputs the amplitude of this line at the interpolation

point, Ii, defined by the interpolation index, µ , as the proportion of the half-UI time interval

between Si and Si+1. The interpolator computes its output, Ii, according to

Ii = (1−µ)Si + µSi+1, (2.8)

which requires two multiplier and two adders, amounting to a high circuit complexity of

the interpolator.

At the cost of significantly increasing the interpolation order beyond linear, this inter-


Jitter frequencyω2


0.5

FIFOSize

ω1

1

Figure 2.25: Jitter tolerance of the interpolating feedback CDR.

polating feedback CDR architecture is able to recover the data from the samples taken at

rates between 1x and 2x the baud rate [39]. The high interpolation order, reaching 8th order

or higher in some examples [42], causes high system complexity and power consumption,

which makes this approach impractical for high-speed low-cost interconnects.

The jitter tolerance of the interpolating feedback CDR is shown in Figure 2.25. The

shape of this jitter tolerance curve is determined by four constraints. The first constraint

is due to the blind nature of sampling in the interpolating feedback CDR. With blind sam-

pling, the CDR’s sampling clock may have some frequency offset with respect to the trans-

mitter clock. As a consequence, similar to an oversampling CDR in Section 2.2.2, the

low-frequency jitter tolerance of the interpolating feedback CDR is limited by the size of

the data-retiming FIFO. The second constraint limiting the jitter tolerance is imposed by

the phase-tracking loop. This constraint is similar to that of a phase-tracking CDR in Sec-

tion 2.2.1. The loop properties also define the frequency range ω1 – ω2 in which the loop

determines the jitter tolerance. The third constraint stems from the interpolation with jitter

frequencies exceeding the bandwidth of the phase-recovery loop. The goal of the loop is to

guide the interpolation such that the interpolated samples are aligned with the UI centers.

With an ideal interpolation, this technique becomes equivalent to sampling the signal at the

UI centers and therefore the jitter tolerance above ω2 is 1 UIPP. The non-ideal interpola-

tion further reduces the jitter tolerance limit of the interpolating feedback CDR. The forth

constraint limits the jitter tolerance near the maximum jitter frequency, which is half of the

baud rate.

This constraint is related to the sampling rate and the interpolation order. The CDR

fails to recover the data when none of the samples fall into a bit period and the interpolator

fails to calculate the correct signal value at the UI center. In the example of 2x sampling


ADCRXIN

2

DTRECData

Decision

1


Digital Timing Recovery Core

Filter Control

Joint-Adaptation Filter (EQ + TR)

PD Loop FilterφERR

φAVG

Figure 2.26: Simplified block-diagram of a joint-adaptation-based CDR [43].

and linear interpolation, this case occurs when a bit period reduces below 0.5 UI, which

is a time between two adjacent samples. In this case neither sampling itself, nor linear

interpolation recovers the bit, which causes an error. As a consequence, at maximum jitter

frequency, the jitter tolerance is limited to 0.5 UIPP in 2x interpolating feedback CDR with

linear interpolation. In contrast, higher interpolation order may recover a data bit even

when no samples fall in a UI. This improves this jitter tolerance limit at the cost of the

circuit complexity required to implement the higher order interpolation.

In addition to the interpolating-feedback approach, it is possible to use a joint-adaptation

approach in order to recover the clock and data from the blind samples taken at 2x the

baud rate. Figure 2.26 illustrates a simplified block diagram of a joint-adaptation-based

CDR, which combines the equalizer (EQ) and the timing-recovery (TR) interpolator into a

joint-adaptation filter [43]. The joint-adaptation filter simultaneously varies its magnitude

response and phase response in order to perform two actions. First, the filter compensates

the received signal for the channel ISI by adjusting its magnitude response, thus performing

the signal equalization. Second, the filter shifts the received signal in time by adjusting its

phase response, namely the group delay, such that at the filter output the signal is aligned

with the sampling clock. A feedback loop, consisting of a phase detector, loop filter and

digital timing recovery core, controls the joint-adaptation filter. The joint-adaptation-based

CDR successfully recovers the data under the condition that the frequency offset between

the transmitter and receiver is small. With the increasing frequency offset, the performance

of the CDR degrades. This high sensitivity to the frequency offset restricts this CDR to the

applications that guarantee small frequency offsets, such as backplane Ethernet channels.

Compared to the Mueller-Muller CDR architecture [38], the interpolating feedback


and joint-adaptation-based CDRs [39,40,43] replace the analog/digital phase-tracking loop

with an all-digital feedback loop, simplifying the overall receiver design. This comes at

the cost of doubling the sampling rate with the ADC and increasing the complexity of

the digital block due to the interpolator or filter. Unlike the Mueller-Muller CDR, the

interpolating feedback architecture is insensitive to the impulse response of the channel,

which makes it suitable for the low-cost interconnects with little control of the channel

properties. The interpolating feedback CDR only requires that the sampling rate exceeds

the bandwidth of the received signal by 2x or more in order to avoid aliasing during the

interpolation. This condition is typically satisfied by the limited channel bandwidth in the

high-speed interconnects: with high channel attenuation above fB/2, sampling at 2x, i.e. at

2 fB, is sufficient to avoid aliasing. A simple anti-aliasing filter preceding the ADC prevents

the aliasing if the channel bandwidth exceeds the value prescribed by the standard.

The interpolator in the digital feedback loop introduces an error into the set of inter-

polated samples due to the low-order interpolation. This interpolation error, which may

degrade the CDR performance, can be reduced by using a higher order interpolation in-

stead of a linear interpolation. Higher order interpolation, however, comes at the cost of

further increase of the interpolator circuit complexity, which in turn increases the interpo-

lator latency. Since the interpolator is in the feedback loop, this latency can compromise

the stability of the loop [6]. Thus, the interpolation order is a trade-off between the inter-

polation accuracy on the one hand, and the implementation complexity with the resulting

latency on the other hand.

The CDR topologies described in this chapter are widely used in the high-speed re-

ceivers. Table 2.1 summarizes the key characteristics of the recently published receivers.

The table lists the year of publication, the data rate, the power consumption, and the CDR

type in every receiver. The receivers listed in this table use the phase-tracking, Mueller-

Muller and oversampling CDR types. However, the papers presenting interpolating feed-

back CDRs with measured results were published over 10 years ago. The lack of recent

publications on the interpolating feedback CDRs suggests that the circuit complexity and

high sampling rate make the use of this CDR type challenging in the high-speed receivers.

The remainder of this thesis explores new CDR architectures for the blind-sampling

ADC-based receivers. First, the thesis proposes a feed-forward CDR architecture that elim-

inates the interpolating feedback from the digital CDR to reduce the CDR circuit complex-

2.4. Summary 33

Table 2.1: Recently published high-speed receivers.

Ref. Year Data Rate Power CDR Type

[23] 2009 10.3 Gb/s 260 mW Phase-tracking

[24] 2005 10 Gb/s 133 mW Phase-tracking

[44] 2010 12 Gb/s 130 mW Phase-tracking

[45] 2006 6.4 Gb/s 310 mW Mueller-Muller

[6] 2007 12.5 Gb/s 330 mW Mueller-Muller

[4] 2008 10 Gb/s 4.5 W Mueller-Muller with Viterbi MLSD

[36] 2007 44 Gb/s 910 mW Oversampling

[46] 2007 3.5 Gb/s 115 mW Oversampling

ity [17, 18]. This architecture recovers the phase and data from the blind ADC samples

taken at 2x the baud rate. Then, the thesis proposes a fractional-sampling-rate (FSR) CDR

architecture that reduces the sampling rate below 2x in order to reduce the ADC power

and area [19]. Both proposed architectures are experimentally validated through the design,

fabrication and measurements of the receiver test-chips.

2.4 Summary

Two effects resulting from the non-ideal channels, the ISI and timing uncertainties, neces-

sitate the two blocks in the high-speed receivers: the equalizer and the CDR. The binary-

sampling receivers implement the pre-sampling equalizers in the analog domain, which

limits the amount of equalization that can be integrated in a receiver. Sampling the signal

with an ADC allows for the post-sampling equalization in the digital domain, which in turn

allows to implement more extensive equalization schemes. The receivers compensate for

the timing uncertainties using either a phase-tracking or a blind-sampling CDR. The phase-

tracking CDR architectures use a feedback loop that spans across the analog and digital

domains to align the sampling instances with the received data. The blind-sampling CDRs,

in contrast, sample the signal without any phase relation between the sampling and data

phases, allowing to implement the CDRs entirely in the digital domain. The ADC-based

2.4. Summary 34

blind-sampling receivers implement both the equalizer and the CDR in the digital domain,

which makes these receivers simple to scale with the IC technologies. The blind ADC sam-

pling, however, comes at the cost of a high power consumption by the ADC and a high

complexity of the digital CDR due to the interpolating digital feedback loop.

This thesis first proposes a low-complexity feed-forward CDR architecture for the blind

ADC-based receivers in Chapter 3, which eliminates the digital feedback loop and thus

reduces the power and area of the digital CDR. Then, the thesis proposes an FSR CDR

architecture in Chapter 4, which reduces the sampling rate below 2x to save the ADC

power and area.

Chapter 3

An ADC-Based Feed-Forward CDR Architecture

THE BLIND-SAMPLING ADC-based receivers discussed in Section 2.3.2 implement

the CDR phase-tracking feedback loop entirely in the digital domain thus allowing

for a single interface between the analog and digital domains through the ADC [39, 40].

This class of receivers also allows to equalize the received signal in the digital domain. In

fact, with the exception of the ADC, the entire receiver can be implemented in the digital

domain. This aspect of the blind-sampling ADC-based receivers makes them highly scal-

able with the technology nodes, robust to process, voltage and temperature variations, and

allows for a short design time due to the automation of the digital design flow. However,

the digital feedback loop in the previously reported CDRs relies on interpolation to recover

the phase and data from the digital signal samples. This interpolating feedback leads to a

high complexity of the digital CDR thus restricting the data rates of this class of receivers.

This chapter proposes a feed-forward CDR architecture for the blind-sampling ADC-

based receivers. This architecture recovers the phase and data directly from the blind digital

samples of the received signal in a feed-forward manner, eliminating the need for the inter-

polating feedback loop. The feed-forward topology reduces the CDR’s circuit complexity,

making this architecture suitable for the high-speed interconnects. To experimentally vali-

date the proposed architecture, a 5 Gb/s 2x ADC-based receiver with the feed-forward CDR

was implemented in 65 nm CMOS [17, 18]. This chapter presents the feed-forward CDR

architecture using a top-down approach: an introduction of the architecture and receiver is

followed by a description of the building blocks essential to this architecture.

The remainder of this chapter is organized as follows. First, Section 3.1 introduces

35

3.1. Feed-Forward CDR Architecture 36

ADCPhase

DetectorFilterRXIN


φX φERR

φAVG

Data

DecisionDTREC

φAVG

Figure 3.1: Proposed feed-forward CDR architecture (simplified block-diagram).

the proposed feed-forward CDR architecture and presents an ADC-based receiver with the

feed-forward CDR. Then, Section 3.2 presents the phase detection scheme that enables

the feed-forward phase and data recovery. Next, Sections 3.3 and 3.4 describe the phase-

recovery filter and the data-decision scheme used in the proposed CDR. Section 3.5 presents

a data retiming scheme used in the feed-forward CDR to assure the error-free data recovery

in the presence of a frequency mismatch between the transmitter and receiver. Section 3.6

validates the proposed CDR architecture through the simulations and measurements of a

receiver test-chip implementing the feed-forward CDR. Finally, Section 3.7 summarizes

this chapter.

3.1 Feed-Forward CDR Architecture

Figure 3.1 presents a simplified block-diagram of the proposed feed-forward CDR archi-

tecture. First, a blind sampling clock triggers the ADC to sample the received signal, RXIN ,

without any phase relation between the sampling instances and the UI boundaries in the

signal. Then, a phase detector (PD) estimates the data zero-crossing phase with respect to

the sampling clock for every transition in the received signal. This phase is further referred

to as the instantaneous data phase, φX . Next, the CDR recovers the average data phase,

φAV G, in two steps: a phase subtracter calculates a phase error, φERR, by subtracting φAV G

from φX , and then a filter averages φERR to obtain φAV G. Finally, a data decision block picks

a sliced sample of RXIN as recovered data, DTREC, based on the values of φX and φAV G.

The proposed architecture estimates the data phase directly from the blind digital sam-

ples of the received signal. Compared to the previously reported blind ADC-based CDRs

(see Figure 2.22), this direct phase estimation eliminates the need for the interpolation and

3.1. Feed-Forward CDR Architecture 37

RXIN

5Gb/s

5 GHz BlindSampling Clock

32

Digital CDR

2 phases

EQφERR5-bit

5GS/sADC

2:32

32PD

16Filter

Data

DecisionFIFO

φX

φAVG

16DTREC

15/17

φAVG

16

S

D

RetimingClock

Figure 3.2: Receiver with the proposed feed-forward CDR architecture.

moves the PD outside the phase-recovery feedback loop. As a result, the phase-recovery

loop in the proposed architecture simplifies to the phase subtracter and the filter, as shown in

Figure 3.1. In fact, this phase-recovery loop can be viewed as an infinite-impulse-response

(IIR) phase-recovery filter with input φX and output φAV G, which is a convenient imple-

mentation of the averaging function. Since the data-decision block uses only the input and

output of the phase-recovery filter, the proposed CDR architecture is referred to as feed-

forward architecture. This feed-forward topology leads to a low circuit complexity of the

CDR as will be shown in the remainder of this chapter through a description of a sample

receiver with the proposed CDR and its building blocks.

Figure 3.2 illustrates a block-diagram of a 5 Gb/s 2x ADC-based receiver with the pro-

posed feed-forward CDR architecture. A 5 GHz two-phase sampling clock triggers two

time-interleaved 5-bit ADCs to sample the 5 Gb/s received signal, RXIN , blindly at 2x the

baud rate for the total sampling rate of 10 GS/s. The two phases of the sampling clock

are further referred to as 0° phase and 180° phase. To reduce the operating speed of the

digital block, a 2:32 DeMUX then feeds 32 samples at every 16-UI interval, or frame, to

the digital CDR. These 32 samples, each represented by 5 bits, correspond to 16 consecu-

tive sampling cycles. A 1:16 clock divider divides the sampling clock to trigger the digital

CDR. An equalizer compensates the DeMUXed signal samples for the channel loss. Next,

the PD uses the equalized samples, S, to estimate φX for every UI with a data transition. A

phase subtracter and a filter average φX to generate φAV G. A data-decision block followed

by a FIFO compose the data recovery path of the CDR. In this path, the data-decision block

picks one sliced sample per UI as a data bit, D, by comparing φX with φAV G for every UI.

3.2. Phase-Detection Scheme 38

Si+1

Si

µ

-3

5

0t

(1–µ)Si + µSi+1 = 0

Si = -3

Si+1=5

µ = 3/8

µ = Si

Si – Si+1

Figure 3.3: Proposed linear phase estimation scheme.

Then, the FIFO compensates the data bits for the frequency offset between the transmitter

and receiver, and outputs the recovered data, DTREC, by re-timing the data from the blind

sampling clock domain. To simplify the CDR verification, the re-timing clock is the baud-

rate clock (divided by 16), which assures that the data rates at the CDR input and output

are identical.

The remainder of this chapter first presents the implementation details of the CDR’s

building blocks in Sections 3.2–3.5, and then validates the proposed architecture through

the simulations and measurements of the receiver with the feed-forward CDR in Section 3.6.

3.2 Phase-Detection Scheme

Figure 3.3 presents the proposed linear phase estimation scheme used in the feed-forward

CDR architecture. Similar to the operation of linear interpolation, the linear phase estima-

tion joins two samples of opposite polarities, Si and Si+1, with a line. The equation of this

line allows to interpolate a new sample value, Ii, between Si and Si+1 at the time marked by

the interpolation index, µ:

Ii(µ) = (1−µ)Si + µSi+1. (3.1)

However, instead of calculating Ii(µ) as in the interpolator (see Figure 2.24), the proposed

phase estimation scheme sets Ii(µ) = 0 and calculates the corresponding interpolating in-

dex, µ , using:

µ =Si

Si−Si+1. (3.2)


5

-2

0

B, 180o

A, 0o

φX

t

000 001 010 011

[0, 0.5) UI

X

A, 0oC, 360o

100 101 110 111

[0.5, 1) UI

(a) Transition between A and B, φX < 0.5 UI

2

-3

0

B, 180o

A, 0o

φX

t

000 001 010 011

[0, 0.5) UI

X

A, 0oC, 360o

100 101 110 111

[0.5, 1) UI

(b) Transition between B and C, φX > 0.5 UI

Figure 3.4: Linear estimation of instantaneous phase, φX .

In fact, this interpolating index estimates the time of the zero-crossing between Si and Si+1

(with respect to Si) directly from the sample values, which enables the phase detection in

the feed-forward manner. Furthermore, limiting the resolution of µ to 2 bits allows for a

low-complexity circuit implementation of the proposed phase estimation scheme as shown

later in this subsection. Since both the interpolation and phase estimation rely on the same

relation of (3.1), these two operations have similar effects on the phase detection accuracy.

The PD estimates the data zero-crossing phase, φX , from the equalized digital samples

of the received signal. The PD processed the samples from 16 cycles of the sampling

clock in parallel (2 samples per cycle) and outputs φX for every UI with a data transition.

Figure 3.4 illustrates the phase detection scheme based on the linear estimation through

an example of a single cycle of the sampling clock. The PD looks at three consecutive

samples: A, B and C, which correspond to 0°, 180° and 360° phases of the blind sampling

clock. Since sample C corresponds to 360° phase, it is also sample A (0° phase) in the

following cycle of the sampling clock. When two adjacent samples have opposite signs,

the PD linearly estimates the time of zero-crossing between these two samples with respect


Y N

Y N Y N

φX = 011φX = 010φX = 001φX = 000

|A|

|A| + |B|

|A| + |B| < 4|B||A| + |B| > 4|A|

|A| < |B|

Figure 3.5: Flowchart of 2-bit accurate division for calculating φX .

to 0° phase of the sampling clock. The estimated zero crossing is marked as point X in

Figure 3.4. Since the adjacent samples are 0.5 UI apart in time, φX is calculated as

φX =0.5|A||A|+ |B| (3.3)

when the transition occurs between A and B in the example of Figure 3.4(a), and as

φX = 0.5+0.5|B||B|+ |C| (3.4)

when the transition occurs between B and C in the example of Figure 3.4(b). The phase

calculation occurs only when the adjacent samples have opposite polarities, which allows

to use the absolute values of the sample magnitudes in (3.3) and (3.4). The use of the

absolute values, in turn, allows to treat ‘0 – 1’ and ‘1 – 0’ data transitions identically, which

simplifies the PD circuit implementation.

To maintain low circuit complexity, the division accuracy in (3.3) and (3.4) is limited

to 2 bits. The flowchart in Figure 3.5 shows that this 2-bit accurate division requires only

simple operations: addition, comparison and left shift by 2 (multiply by 4 in decimal).

Since this division operation covers only 0.5 UI, the total φX resolution is 3 bits per UI.

The third, most significant, bit (MSB) of φX depends on the position of the zero crossing:

a crossing between A and B makes MSB=‘0’, while a crossing between B and C makes

MSB=‘1’. The discussion of the effect of limiting φX accuracy to 3 bits is postponed till

Section 3.4.

Nominally, there is at most one data transition in every cycle of the sampling clock:

either between A and B or between B and C. However, duty-cycle distortion (DCD) might

cause two transitions per sampling cycle: between A and B as well as between B and C.

3.3. Phase-Recovery Filter 41

16φAVGφX

φERR K1z-1

1 – z-1K2z-1

1 – z-1K3z-1

1 – z-1

3rd order low-pass filterphase subtracter

K1z-1

1 – z-1

K1K2z-2

(1 – z-1)2

K1K2K3z-3

(1 – z-1)3

φAVGφERR

φX[1] mod

mod

mod

φERR[1]

φERR[2]

φERR[16]

1/16

φAVG

φERRφX[2]

φX[16]

Figure 3.6: Phase recovery filter.

When two such transitions occur, the PD calculates φX as a modulo-1 sum of both zero-

crossing phases so that both transitions contribute to the average phase recovery. This

allows the phase detection scheme to estimate the data phase in the presence of DCD.

The following subsection describes the phase recovery filter implemented in the CDR.

3.3 Phase-Recovery Filter

The phase recovery filter averages the instantaneous zero-crossing phase, φX , to recover

φAV G, which tracks the data phase in the average sense. For this phase tracking, the CDR

uses a discrete-time IIR filter shown in Figure 3.6. The filter consists of a phase subtracter

and a 3rd order low-pass filter in a feedback loop.

The phase subtracter, shown in the left inset of Figure 3.6, calculates the phase differ-

ence between φX and φAV G for 16 UIs at a time and outputs the combined phase error, φERR,

for these 16 UIs. To assure that the phase recovery converges for any phase offset between

the data and sampling phases, φERR is calculated in a modulo manner such that φERR[i] is

in the range [–0.5, 0.5) UI. The subtracter excludes the UIs without data transitions from

contributing to φERR. φERR in the feed-forward CDR architecture plays the same role as the


FF

010001000011110101100111x2

÷4

÷2

KCONST

0

gain[2:0]

INOUT

3

2

1.5

1

0.75

0.5

0.25

0

KPROG

Figure 3.7: Discrete-time integrator with programmable gain.

PD output in a conventional phase-tracking CDR.

The phase error is fed into the low-pass filter (LPF), which consists of three cascaded

discrete-time delaying integrators with programmable gains K1, K2 and K3. These pro-

grammable gains allow adjusting the CDR’s jitter-tracking bandwidth. Figure 3.7 presents

the implementation of the integrators used in the LPF. First, the input signal, IN, is scaled

by the product of a constant gain, KCONST , with a programmable gain, KPROG. Then, the

scaled signal is accumulated using an adder and a flip-flop (FF) in a feedback configura-

tion to generate the output, OUT . A control signal, gain[2 : 0], sets the value of KPROG

through an 8-to-1 selector. The eight possible KPROG values are chosen such that they can

be calculated using only simple to implement operations: left/right shift (multiply/divide

by 2 in decimal) and addition. The resulting KPROG values range from 1/4 to 3, while

KPROG = 0 is used for debugging purposes. The resolution of the intermediate phase values

in the integrators is 16 bits: 10 least significant bits represent the fractional part of the phase

(1 UI long period), while 6 most significant bits represent the integer part. To tolerate the

jitter exceeding 64 UIs (26 UIs), ‘roll-over’ rather than ‘saturating’ counters are used in the

integrators.

Three of these integrators form three forward paths in the LPF: 1st, 2nd and 3rd order

paths, as shown in the signal flow diagram in the right inset of Figure 3.6. These paths

add up to the average (recovered) phase, φAV G. The transfer function of the entire phase

recovery filter is:φAV G

φX=

AFW

1+AFW, (3.5)


0.1

1

10

100

1000

10000

104 106 107 109

Jitter Frequency, Hz105 108

Jitter Amplitude, UI pp

1st Order LPF

2nd Order LPF

3rd Order LPF

Figure 3.8: Jitter tolerance dependence on the LPF order (simulated, BER ≤ 5 ·10−6).

where AFW is the forward gain of the LPF:

AFW =K1z−1

1− z−1 +K1K2z−2

(1− z−1)2 +K1K2K3z−3

(1− z−1)3 . (3.6)

Three criteria determine the filter gain values: the desired jitter-tracking bandwidth

of the CDR, the absence of gain peaking in the jitter-transfer function of (3.5), and the

low-circuit-complexity filter implementation. First, the CDR jitter-tracking bandwidth was

selected (approximately 5 MHz in the proposed receiver). Then, through simulations the

gain values were determined to achieve this bandwidth while minimizing the gain peaking

in the jitter transfer function. Finally, the gain values were rounded-off to the nearest easy-

to-implement values in binary. This procedure leads to K1 = 3/64, K2 = 7/2048, and K3 =

5/2048. To illustrate the low complexity gain implementation, K1 = 3/64 is implemented

as K1 = 1/32+1/64, where gains of 1/32 and 1/64 are obtained through right-shifting the

input value by 5 and 6 bits. In a similar manner, K2 and K3 are composed of right-shift and

addition operations to maintain the low circuit complexity. These gain values are used in

the simulations and measurements presented in Section 3.6.

To explore the effect of the order of the phase-recovery filter on the CDR performance,

the filter order was reduced from 3rd to 2nd and 1st, and the CDR’s jitter tolerance was

simulated, as illustrated in Figure 3.8. None that in all three cases, the CDR’s jitter-tracking

bandwidth remains constant. As the order changes from 1st to 2nd, the high-frequency

3.4. Data-Decision Scheme 44

jitter tolerance improves by approximately 0.2 UIPP. Furthermore, with the use of the 2nd

order filter, the jitter tolerance roll-off slope increases allowing for a higher tolerance at

low frequencies. The 3rd order filter shows a small improvement of the jitter tolerance

compared to the 2nd order filter: the high-frequency jitter tolerance remains unchanged, but

the low-frequency jitter tolerance increases by up to 3× (at 32 kHz for instance). For a

safe design with a high tolerance to low-frequency jitter, the 3rd order filter was used in the

proposed feed-forward CDR.

The CDR uses the recovered φAV G along with φX for the data recovery according to the

scheme presented in the following section.

3.4 Data-Decision Scheme

The proposed feed-forward clock recovery eliminates the interpolator from the CDR thus

reducing the circuit complexity. As a consequence of this interpolator elimination, the

value of the signal at the UI center is not interpolated and therefore is unknown. To enable

error-free data recovery along with the feed-forward clock recovery, a data decision scheme

is essential to the proposed feed-forward CDR. The role of the data decision block is to

estimate the sign of the received signal near the maximum eye opening, i.e., near the UI

center. Since φAV G indicates the average position of the UI boundaries, the average position

of the UI centers is calculated by adding 0.5 UI to φAV G using modulo-1 addition. This

UI-center phase is referred to as the data-picking phase, φPICK . The data decision block

takes the signs of the samples from 16 sampling cycles and picks one decision sample for

every UI by comparing φX and φPICK .

Figure 3.9 illustrates the proposed data-picking scheme through an example of a single

sampling cycle. The data decision block takes three consecutive sliced samples (A, B and

C) and picks one of these samples as the decision bit. This decision bit is picked such that

it belongs to the UI marked by the average UI-center phase φPICK . Three sample cases

demonstrate the data decision scheme: a jitter-free case and two cases of jitter.

In a jitter-free case shown in Figure 3.9(a), φX coincides with φAV G, and hence φX is

0.5 UI away from φPICK . φPICK marks the UI from which the data is recovered (shaded

in the figure). The decision scheme thus picks one of the two samples adjacent to φPICK :

either A or B in this example. In a jitter-free case, both samples adjacent to φPICK have the


No JitterBA C

φX, φAVGφPICK 0.5 UI

(a) Jitter-free case�� (b) Jitter example 1�� (c) Jitter example 2

Figure 3.9: Data decision scheme.

same sign and hence the decision is trivial.

In the presence of jitter, φX deviates from φAV G, and the separation between φX and

φPICK differs from 0.5 UI. As a consequence, the two samples adjacent to φPICK might

belong to different UIs, and these samples might have opposite signs, as illustrated in

Figs. 3.9(b) and 3.9(c). In this case, the data decision scheme picks the sample that be-

longs to the UI marked by φPICK (shaded UI in the figure). For instance, in Figure 3.9(b)

the jitter causes φX to shift left compared to Figure 3.9(a) and A is picked as the decision

data. In the example of Figure 3.9(c), φX shifts right compared to Figure 3.9(a) and sample

B is picked. This scheme requires a single comparison between φX and φPICK for every UI

with a data transition.

Limited channel bandwidth and DCD reduce the width of UI-long data pulses and thus

cause two transitions per sampling cycle. This case is referred to as an isolated pulse.

Figure 3.10 illustrates a nominal and isolated UI-long pulses. In the nominal case of Fig-

ure 3.10(a), samples B and C are equidistant from φPICK , which makes both samples equally

correct decisions. However, in the presence of isolated pulses (Figs. 3.10(b) and 3.10(c)),

two transitions per UI prohibit defining a single instantaneous phase value, φX . As a conse-


A B C

φPICK

(a) Nominal pulse width, jitter-free case

A B C

(b) Two transitions in the same cycle

Bi Bi+1Ai+1

Ci

(c) Two transitions in adjacent cycles

Figure 3.10: Data decision with isolated pulses.

quence, a comparison between φX and φPICK proves insufficient for a correct data decision.

The data decision scheme detects these isolated pulses using XOR operation on every pair

of consecutive samples. It then disregards the phase information and picks the sample at

the center of the pulse, i.e., farthest from both transitions, as shown in Figure 3.10. When

two transitions occur in the same sampling cycle between A and B, and between B and C

(see Figure 3.10(b)), B is picked as the decision data. In a similar manner the decision block

checks for an isolated pulse at the boundary between two consecutive sampling cycles. As

Figure 3.10(c) illustrates, a transition between Bi and Ci in sampling cycle i followed by a

transition between Ai+1 and Bi+1 in cycle i+1 causes the decision scheme to pick Ci (Ai+1)

as the decision bit.

The proposed data decision scheme based on the comparison between φX and φPICK

recovers the data correctly when φX deviates from φAV G by up to 0.5 UI in either direction.

Hence the CDR has the theoretical maximum jitter tolerance of 1 UIPP at jitter frequencies

exceeding the bandwidth of the phase recovery filter. Estimating φX with 3-bit accuracy

(instead of infinite accuracy) results in the reduction of the high frequency tolerance by

only 1/8 UI. Further discussion of the jitter tolerance is postponed till Section 3.6.

The data decision schemes in the proposed feed-forward CDR and the interpolating

feedback CDR are functionally equivalent to each other. Figure 3.11 illustrates this equiva-


BA

φAVG

t

C

IAB

IBC

φPICK

B

A

φAVG

t

C

IAB

IBC

φPICK

BA

φX, φAVG

t

C

X

φPICK

B

A

φAVG

t

CφPICK

X

φX

Interpolating Feedback CDR Feed-Forward CDR

Jitter-Free Case

Jitter Example

Figure 3.11: Data decision in the interpolating feedback and feed-forward CDRs.

lence by comparing the decision schemes in the two CDRs for a jitter-free case and a jitter

example in a tabular form. The interpolating feedback CDR first interpolates a new sample,

IAB, between A and B at time φPICK and then takes the sign of this interpolated sample, i.e.,

slices IAB, to get the decision bit. In fact, the decision bit, sign(IAB), inherits the sign of

either A or B — the two samples adjacent to φPICK — depending on the values of A, B and

φPICK . In the jitter-free case, A and B belong to the same UI and therefore they have the

same sign, making the interpolation and slicing redundant. Jitter or, equivalently, voltage

noise may cause a sign inversion of either A or B. In the example of Figure 3.11, this sign

inversion occurs at sample B. In this case, the interpolated sample, IAB, is a weighted aver-

age of A and B at time φPICK , which is the best estimate of the signal value at the UI center

using linear interpolation.

The feed-forward CDR, in contrast, simply takes the sign of A or B as the decision bit.

In the jitter-free case, when A and B have identical signs, the feed-forward CDR assigns

sign(A) or sign(B) to be the decision bit. This yields the same result as the interpolating

feedback CDR. When jitter leads to a sign difference between A and B, the feed-forward

CDR marks with φX the time at which the interpolation line changes the sign (from positive

3.5. Data Retiming Scheme 48

to negative in the example of Figure 3.11). The feed-forward CDR picks the sign of the

sample that is on the side of φPICK (that is, sign(A) in this example). Again, this yields the

same result as the interpolating feedback CDR.

In fact, in the interpolating feedback CDR IAB carries the weighed average information

in voltage domain, while in the feed-forward CDR φX carries the same information in time

domain. Since the decision is represented with only one bit, both schemes lead to identical

decisions while subjected to identical jitter or voltage noise conditions.

The feed-forward and the interpolating feedback CDRs differ from the point of view

of decision feedback equalization (DFE), which requires a sample value at the UI center

to cancel ISI. The interpolating feedback CDR interpolates a sample at the UI center, IAB

in Figure 3.11, which allows using conventional DFE schemes. The feed-forward CDR,

in contrast, avoids interpolating a sample at the UI center, which requires adjustments to

the DFE approach. The authors in [47] present a DFE scheme for the feed-forward CDRs

that modifies the sample values to cancel ISI instead of modifying the interpolated UI

center value. Hence, [47] demonstrates that the proposed feed-forward CDR can be used

in conjunction with the DFE.

The following section discusses the frequency offset compensation scheme that pre-

vents the data errors due to a frequency mismatch between the transmitter and receiver.

3.5 Data Retiming Scheme

The transmitter clock determines the data rate at the input of the CDR, while the blind

sampling clock determines the data rate at the output of the data decision block. Since

these two clocks are free-running with respect to each other, a frequency offset between

them is inevitable. This frequency offset, in turn, leads to a mismatch between the data

rates at the CDR input and at the data decision output. A FIFO absorbs this data rate

mismatch by retiming the decision bits from the blind sampling clock domain to a retiming

clock domain as shown in Figure 3.2.

Figure 3.12 illustrates three possible data retiming schemes. Since the average phase,

φAV G, indicates the data phase with respect to the blind sampling clock, a phase interpolator

(PI) in a feed-forward path controlled by φAV G can generate the recovered clock, CKREC,

from the sampling clock as shown in Figure 3.12(a). The recovered clock then retimes the


RX5 Gb/s

5 GHz

32

2 ph.

ADC FIFO16

DTREC

15�17

φAVG

16

D

Digital

CDR

PICKREC

Blind sampling clock Retiming clock

4 ph.

1 ph.

(a) Phase interpolator (PI) generates the recovered clock

RX5 Gb/s

5 GHz

32

2 ph.

ADC FIFO16

DTREC15�17

φAVG

16

D

Digital

CDR

CKCONS


Data

Consumer

(b) Data consumer retimes the recovered data

RX5 Gb/s

5 GHz

32

2 ph.

ADC FIFO16

DTREC

15�17

φAVG

16

D

Digital

CDR


fB/16

(c) Clock-forwarded system

Figure 3.12: Data retiming schemes.

decision bits, D, through a FIFO such that the rate of the recovered data, DTREC, matches

the rate of the received data. Since the recovered clock is available in this configuration,

the feed-forward CDR with the PI retiming scheme becomes similar to a phase-tracking

CDR in the ability to track frequency offsets. This PI retiming method requires an ana-

log component – a PI running at fB/16 – to generate CKREC, thus increasing the system

complexity.

Typically the recovered data is retimed one more time to the clock domain of the data

consumer. In fact, the feed-forward CDR architecture allows to retime the decision bits

from the blind sampling clock directly to the data consumer clock, CKCONS, as illustrated


RdPtri

RdPtri+1WrPtri

WrPtri+1

Write

Port

Read

Port

15...17

bits

16

bits

Figure 3.13: Simplified FIFO diagram.

in Figure 3.12(b). Since CKCONS is free-running with respect to the transmitter clock, this

retiming method requires the transmitter and receiver to negotiate the data flow rate using

a flow control technique mentioned in Section 2.2.2. Retiming the data directly to the

consumer clock eliminates the need for PI in the receiver, and it allows for a fully digital

implementation of the CDR and FIFO.

The third data retiming scheme, shown in Figure 3.12(c), is applicable to clock-forwarded

interconnects. In these interconnects, a divided baud-rate clock, fB/16, is transmitted along

with data such that this divided clock is available at the receiver side. The received clock

has no phase relation to the received data, however its frequency matches the transmitter

data rate (divided), and therefore the clock can be used to retime the decision bits. This

retiming scheme is also convenient for characterizing blind-sampling receivers in labora-

tory conditions, and therefore it was used in the measurements of the receiver test-chip

presented in Section 3.6.

Figure 3.13 illustrates a simplified diagram of the FIFO. The FIFO is a circular register

with two ports: a write port (shaded in the figure) and a read port (unshaded in the figure).

The write port, which is synchronized to the sampling clock, places 15, 16 or 17 decision

bits at a time to the register. This variable number of bits at the write port allows the FIFO

to compensate for the frequency offset between the sampling and the recovered clocks. The

read port, which is synchronized to the retiming clock, removes 16 retimed bits at a time.

A write pointer, WrPtri, and a read pointer, RdPtri, define the positions of the ports in the

register. The write and read take place on the opposite sides of the circular register to avoid

metastability. After each write/read access, the pointers are updated to their new positions

in the counter-clockwise direction, WrPtri+1 and RdPtri+1.


A small frequency offset between the transmitter and receiver clocks causes the recov-

ered average phase, φAV G, to constantly shift in one direction. For instance, if the transmitter

clock has a higher frequency than the receiver clock, φAV G constantly reduces, indicating

that the received UI is shorter than the period of the sampling clock. Conversely, φAV G

constantly increases if the transmitter clock has a lower frequency than the receiver clock.

To assure that the CDR can tolerate frequency offsets, φAV G is generated using roll-over (in-

stead of saturating) registers, which allows to track continuous phase shifts in one direction,

i.e., frequency offsets. Since φPICK is a 0.5 UI shifted version of φAV G, both φPICK and φAV G

can be used as the frequency offset indicators. FIFO uses φPICK for this purpose. Once

φPICK crosses the UI boundary as it reduces from one frame of 16 UIs to another, the write

port places 17 bits into the FIFO instead of nominal 16 bits, thus compensating for a higher

data rate at the CDR input. Conversely, the write port places 15 bits into the FIFO once

φPICK crosses the UI boundary in the opposite direction as it increases from one frame to

another, thus compensating for a lower data rate at the CDR input. The write port places

the nominal 16 bits into the FIFO as long as φPICK remains within the UI boundaries going

from frame to frame. This scheme allows the CDR to sustain up to 1 UIPP jitter in every

16 UI frame.

The FIFO retimes data error-free as long as the write and read ports do not overlap. The

FIFO consists of 64 registers, which makes the nominal separation between the ports 16

registers in either direction. Every time φPICK crosses the UI boundaries, the read and write

ports become one register closer to each other. Thus, the FIFO can compensate for up to

32 UIs of jitter to allow the retiming clock to catch up with the transmitted data rate. Once

the transmitter and retiming frequencies are close to each other, the CDR can sustain jitter

exceeding 32 UIPP. If the write and read ports overlap, the FIFO resets the position of the

read port to the opposite side of the circular buffer from the write port. This reset of the

port position is also performed at the receiver start-up.

The results of the CDR simulations and test-chip measurements, presented in the fol-

lowing section, confirm that the retiming scheme successfully compensates for frequency

offsets between the transmitter and receiver.

3.6. Simulation and Measurement Results 52

Jitter Amplitude, UI P

P

0.1

1

10

100

1000

10000

104 105 106 107 108 109 1010

Jitter Frequency, Hz

With SSC, DJ = 0.15 UIPPNo SSC, DJ = 0.05 UIPPNo SSC, DJ = 0.15 UIPP

With SSC, DJ = 0.05 UIPP

Figure 3.14: Simulated jitter tolerance (BER ≤ 5 ·10−6).

Table 3.1: Jitter tolerance simulation conditions (in Figure 3.14).

Input 5 Gb/s, 231−1 PRBSChannel loss 13 dB at 2.5 GHz

TSIM(fJIT > 250kHz) 2 ·105 UIsTSIM(fJIT ≤ 250kHz) 1 jitter period

BER ≤ 5 ·10−6

TX pre-emphasis 3 dBTx-Rx ∆fCLK 600 ppm (nominal)

SSC freq. modulation 0...−5000ppm at 32 kHzTx-Rx ∆fCLK (with SSC) 10600 ppm, 1.06 % (max)

Tx RJ 0.17 UIPP (Gaussian)Tx DJ 0.19 UIPP (dual-Dirac)Rx RJ 0.23 UIPP (Gaussian)Rx DJ legend in Fig. 3.14 (dual-Dirac)

3.6 Simulation and Measurement Results

To validate the proposed CDR architecture, a receiver with the feed-forward CDR shown

in Figure 3.2 was first simulated on a behavioral level, and then it was fabricated and

characterized in a test-chip in 65 nm CMOS.


The CDR was simulated using an event-driven behavioral model [48] in Simulink. This

model accounts for a limited channel BW, supports asynchronous clock domains, and al-

lows adding multiple jitter sources into the simulation. Figure 3.14 presents the simulated

jitter tolerance and Table 3.1 summarizes the simulation conditions. In these simulations,

the sinusoidal jitter was superimposed on random, deterministic jitter (RJ and DJ) and a fre-

quency offset between the transmitter and receiver. The CDR was simulated with a 5 Gb/s

231−1 PRBS sequence passed through a channel with 13 dB attenuation at 2.5 GHz. The

transmitter pre-emphasis is 3 dB. To maintain reasonable simulation time, the simulations

were performed for 2·105 UIs for jitter frequencies, fJIT , above 250 kHz, and for one full

jitter period for fJIT below 250 kHz, which corresponds to BER≤ 5 · 10−6. The nominal

frequency offset between transmitter and receiver clocks, ∆ fCLK , was set to 600 ppm. In

addition to this nominal ∆ fCLK , an offset of up to 5000 ppm was introduced to emulate a

spread-spectrum clocking (SSC) at 32 kHz. This SSC-induced offset was added both at the

transmitter and at the receiver for a total ∆ fCLK of up to 10600 ppm or 1.06 %. The variance

of RJ was adjusted to reach the reported peak-to-peak values within each simulation run.

These simulations confirm that the proposed CDR recovers error-free data at 5 Gb/s in the

presence of jitter, frequency offset and channel attenuation. The simulated jitter tolerance

is below the maximum theoretical tolerance limit of 1 UIPP at high frequencies due to the

random and deterministic jitter, and the frequency offset in the simulation setup. As the

jitter frequency approaches half the baud rate, the jitter tolerance slightly reduces, which is

the expected effect introduced in Section 2.3.2.

A receiver test-chip with the feed-forward CDR architecture was fabricated in 65 nm

standard-logic CMOS. Figure 3.15 illustrates a simplified design flow of the proposed feed-

forward CDR. First, the event-driven model of the CDR is implemented in Simulink and

simulated at behavioral level. This simulation generates two sets of test-vectors: the input

vectors that excite the CDR, and the output vectors that the CDR produces in response

to the input vectors. Then, the CDR is implemented in RTL, and this RTL is simulated

at behavioral level in Verilog. The Verilog simulation excites the CDR using the input

test-vectors generated in Simulink, and produces the output test-vectors. Finally, the output

test-vectors from Verilog and Simulink are compared with each other. If some discrepancies

are found between the two sets of output vectors, the CDR RTL is passed through a cycle

of corrections and further verifications. When the two sets of output vectors are identical


CDR Model in Simulink

Behavioral CDR Simulation in Simulink

Input Test-Vectors

Output Test-Vectors

CDR Implementationin RTL

Behavioral CDR Simulation in Verilog

Output Test-Vectors

RTL Corrections Vectors Match?No Yes

CDR Circuit Implementation

Figure 3.15: Simplified design flow of the proposed feed-forward CDR.

Table 3.2: Test-chip parameters.

Process 65 nm CMOSData rate 5 Gb/sSupply 1.2 VPower 178.4 mW

Receiver area 0.51 mm2

CDR is ready for the circuit-level implementation. Figure 3.16 presents the die photo of

the fabricated receiver. The ADC, frequency divider and the 2:8 portion of 2:32 DeMUX

are analog custom-designed blocks. The 8:32 portion of 2:32 DeMUX, the FFE, CDR

and the test structures (PRBS comparator and test register) are all synthesized. Table 3.2

summarizes the test-chip parameters.

The receiver test-chip uses the ADC and FFE similar to those presented in [18]. The

ADC consists of two time-interleaved 5 Gb/s 5-bit interpolating flash ADCs to achieve the

total sampling rate of 10 GS/s. To reduce the receiver input loading, the ADC evaluates


250 µm

1900µm

ADC &

2:8 DeMUX

8:32

Test Register

PRBS Comparator

CDR

DeMUX (80 µm)

610µm

600 µm

450µmAnalog

Front-End

(Custom Layout)

Digital

Modules

(Synthesized)

FFE

270µm

Freq. Div.

Figure 3.16: Test-chip die photograph.

four most significant bits (MSBs) using 17 comparators at the front end, and resistively

interpolates the least significant bit (LSB) to achieve the 5-bit resolution. The ADC has a

measured ENOB of 4.2 bit and a power consumption of 110 mW. After the signal samples

are DeMUXed, the samples are compensated for the channel loss using a half-UI-spaced

2-tap FIR filter as an FFE. The filter tap coefficients are programmable through a serial

shift register. The FFE compensates for up to 15 dB of channel attenuation at 2.5 GHz. It

was experimentally shown in [18] that the feed-forward CDR architecture can be used with

an adaptive FFE using a constant modulus algorithm (CMA) for adjusting the tap weights.

Since both unequalized and equalized samples are available in the digital domain, the CMA

adaptation circuits in [18] were implemented entirely in the digital domain.

Figure 3.17 presents the measured jitter tolerance of the fabricated receiver. In these

measurements, a 27 − 1 PRBS sequence running at 5 Gb/s was used as the data source.

The channel attenuation is 10 dB at 2.5 GHz. The transmitter pre-emphasis is 3 dB with

the launch amplitude of 750 mVPP. The receiver is triggered with a blind sampling clock

that has 760 ps random jitter (RMS) and -128 dBc/Hz phase noise at 1 MHz offset. The

measured jitter tolerance was recorded at BER≤ 10−12. For a comparison between the

simulated and measured results, Figure 3.17 includes a simulated jitter tolerance with the

same data source and channel loss. The measured and simulated jitter tolerances closely

match each other. Table 3.3 lists the jitter tolerance measurement and simulation conditions.


0.1

1

10

105 106 107 108

Jitter Amplitude, UI pp


Simulated, BER ≤ 5·10–6

Measured, BER ≤ 10–12

Figure 3.17: Measured jitter tolerance.

Table 3.3: Jitter tolerance measurement and simulation conditions (in Figure 3.17).

Input 5 Gb/s, 27−1 PRBSChannel loss 10 dB at 2.5 GHzTSIM(sim) 2 ·105 UIsBER (sim) ≤ 5 ·10−6

BER (meas) ≤ 10−12

TX pre-emphasis 3 dBTx-Rx ∆fCLK 0

SSC freq. modulation 0Tx-Rx ∆fCLK (with SSC) 0

Tx RJ (sim) 0 UIPP

Tx DJ (sim) 0.05 UIPP (dual-Dirac)Rx RJ (sim) 0.25 UIPP (Gaussian)Rx DJ (sim) 0.05 UIPP (dual-Dirac)

3.7. Summary 57

The receiver consumes 178.4 mW at 5 Gb/s, including the ADC. The entire receiver

occupies the chip area of 0.51 mm2 (test structures excluded).

3.7 Summary

This chapter presented the proposed blind-sampling ADC-based feed-forward CDR archi-

tecture. In this architecture, the ADC samples the received signal blindly at twice the

baud-rate. The blind sampling allows removing the phase-tracking feedback loop from the

CDR, thus simplifying the receiver architecture. The CDR recovers the data phase directly

from digital signal samples in a feed-forward manner, hence eliminating the need for a

digital interpolating feedback loop. This feed-forward topology reduces the CDR circuit

complexity compared to the previously reported blind-sampling interpolating CDRs.

The feed-forward CDR architecture was fabricated in a test-chip receiver in 65 nm

CMOS. The test-chip successfully recovers data at 5 Gb/s in the presence of channel at-

tenuation of 2.5 dB at 2.5 GHz. The receiver occupies 0.51 mm2 of die area and consumes

178.4 mW of power. The CDR simulations and test-chip measurements confirm that the

proposed architecture is suitable for high-speed serial links.

In the presented receiver, the sampling ADC consumes a significant portion of the

receiver power. To reduce the ADC power, the following chapter proposes a fractional sam-

pling rate (FSR) CDR architecture, which reduces the ADC conversion rate by sampling

the received signal at 1.45x the baud rate (instead of 2x). The CDR then recovers the data

from the signal samples using a feed-forward topology.

Chapter 4

A Fractional-Sampling-Rate CDR Architecture

THE FEED-FORWARD CDR architecture presented in the previous chapter reduces

the circuit complexity of the ADC-based receivers at the cost of high sampling rate.

In this architecture, the ADC samples the received signal blindly at 2x the baud rate [17,18],

which is 2x higher compared to the sampling rate in the phase-tracking ADC-based re-

ceivers with the Mueller-Muller CDR architecture [4, 5, 38]. This high sampling rate leads

to high ADC power consumption and area in the feed-forward ADC-based receivers. To

reduce the ADC power and area, this chapter proposes a fractional-sampling-rate (FSR)

feed-forward CDR architecture for the blind-sampling ADC-based receivers. In this archi-

tecture, the ADC samples the received signal blindly at a fractional rate between 2x and

1x, thus reducing the ADC conversion rate below 2x while maintaining a low circuit com-

plexity of the receiver. The proposed CDR then recovers the phase and data from the blind

fractionally-spaced samples using a feed-forward topology. To validate the proposed CDR

architecture, a 6.875 Gb/s 1.45x ADC-based receiver with FSR CDR was implemented and

characterized in 65 nm CMOS [19]. This chapter first introduces the proposed FSR CDR

architecture, and then presents the implementation of the CDR building blocks.

The remainder of this chapter is organized into eight sections. First, Section 4.1 intro-

duces the concept of fractional sampling rate for the feed-forward CDRs and presents an

ADC-based receiver with FSR CDR. Then, Sections 4.2 – 4.6 present the implementation

details of the FSR CDR. Section 4.2 proposes two phase detection schemes for the blind

sampling at fractional rates. Sections 4.3 and 4.4 describe the phase-recovery filter and the

data-decision scheme used in the FSR CDR. Section 4.5 presents two vector compaction

58

4.1. Fractional-Sampling-Rate CDR Architecture 59

(a) Sampling at 2x

(b) Sampling at 1.45x

Figure 4.1: Sampling rates in feed-forward CDR architectures.

schemes that restore the correspondence between the number of samples and the number

of data bits with fractional sampling rates. Section 4.6 describes the data retiming scheme

used in the proposed CDR. Next, Section 4.7 validates the FSR CDR architecture through

the simulations and measurements of a receiver test-chip. Finally, Section 4.8 concludes

this chapter with a summary.

4.1 Fractional-Sampling-Rate CDR Architecture

To introduce the concept of fractional sampling rate, Figure 4.1 illustrates sampling the

received signal at two different rates: at 2x the baud rate and at 1.45x the baud rate. With

sampling at an integer rate of 2x, shown in Figure 4.1(a), every UI is sampled twice. This

sampling rate is typical for the blind-sampling ADC-based receivers [39, 40]. Sampling at

2x requires the ADC conversion rate to exceed the baud rate by twofold, which leads to

high ADC power and area. Figure 4.1(b) illustrates the concept of a fractional sampling

rate through an example of sampling at 1.45x the baud rate. With this rate, some UIs are

sampled once while other UIs are sampled twice for an average rate of 1.45x. This reduc-

tion of the sampling rate allows to reduce the ADC power and area by 27.3 % compared to

sampling at 2x. The remainder of this section proposes a CDR architecture that recovers the

data from the ADC samples taken blindly at a fractional sampling rate. The rate of 1.45x is

used as an example in the presentation of the proposed FSR CDR architecture, while this

architecture is applicable to a wide range of sampling rates between 2x and 1x.

Figure 4.2 presents a block-diagram of a 6.875 Gb/s 1.45x ADC-based receiver with the

4.1. Fractional-Sampling-Rate CDR Architecture 60

RXIN

6.875 Gb/s

5 GHz BlindSampling Clock

Digital CDR

4 ph.

φERR

5-bit2.5 GS/s

ADC

PD16

Filter

DataDecision FIFO

φX

φAVG

11DTREC16

φAVGS

RetimingClock

16

2 4

DataCompactor

1610 GS/s

4:16

Figure 4.2: Receiver with the proposed fractional-sampling-rate CDR architecture.

proposed FSR CDR architecture. A 4-way time-interleaved ADC samples the 6.875 Gb/s

received signal, RXIN , blindly at 10 GS/s for the sampling rate of 1.45x. A 4:16 DeMUX

then feeds 16 samples at a time into the digital CDR for the data recovery. These 16 sam-

ples correspond to an 11-UI interval, or frame. The ADC and the DeMUX are triggered

by a 5 GHz blind sampling clock divided by 2. This clock is further divided by 4 to trigger

the digital CDR at 625 MHz. The digital CDR uses the feed-forward topology presented

in Chapter 3. The CDR consists of a phase-recovery path and a data-recovery path. In the

phase path, first, a phase detector, PD, uses the digital samples, S, to estimate the instan-

taneous zero-crossing phase, φX . Then, a phase subtracter and a filter recover the average

zero-crossing phase, φAV G, from φX . The data-recovery path consists of three blocks: a data

decision, a data compactor and a FIFO. The data path uses the recovered phase values φX

and φAV G to recover the data from the digital samples of the received signal. Since the 16

samples at the input of the data path correspond to 11 UIs, 5 of these 11 UIs are sampled

twice, or equivalently, they have duplicate samples. First, the data decision block picks

11 data bits among the 16 samples and marks the remaining 5 samples as duplicates. To

mark the samples as decision bits or duplicates, the decision block assigns a binary flag

to every sample, and hence the block outputs 16 sliced samples along with 16 flags. The

data compactor then removes the duplicate samples to reduce the recovered data to 11 bits.

Finally, the FIFO retimes the decision bits from the sampling clock domain to a retiming

clock and outputs 16 recovered data bits, DTREC, at a time. The FIFO also compensates the

data bits for frequency offsets between the transmitter and receiver. To simplify the CDR

4.2. Phase-Detection Schemes 61

simulations and measurements, the baud-rate clock (divided by 16) is used as a retiming

clock to assure that the data rates at the CDR input and output are identical.

In the remainder of this chapter, first Sections 4.2 – 4.6 present the implementation of

the CDR’s blocks, and then Section 4.7 validates the proposed CDR architecture through

the simulations and measurements of a test-chip receiver with the FSR CDR.

4.2 Phase-Detection Schemes

The goal of the phase-detection scheme is to estimate the instantaneous zero-crossing phase

from the digital samples of the received signal. In the blind-sampling receivers, this instan-

taneous data phase is typically expressed with respect to one of the sampling phases of the

blind clock in terms of unit intervals. With an integer sampling rate, the sampling phases

repeat every UI. In the example of sampling at 2x the baud rate, presented in Chapter 3, the

sampling phases are referred to as 0° and 180°, and all the phase values are expressed with

respect to the 0° phase of the sampling clock. With a fractional sampling rate, in contrast,

the sampling phase changes from one UI to another. This variable sampling phase poses

the primary challenge in the phase detection in the FSR receivers.

Sections 4.2.1 and 4.2.2 propose two alternative phase detection schemes for the FSR

CDRs: an eye-based and a transition-based schemes. The eye-based phase detection scheme

first uses the variable sampling phase to accumulate the eye diagram of the receiver signal,

and then estimates the data phase from the accumulated eye. The transition-based phase de-

tection scheme takes an alternative approach: it linearly estimates the data phase for every

transition, and then adjusts the estimated phase values to account for the variable sampling

phase.

4.2.1 Eye-Based Phase Detector

The eye-based PD extracts the instantaneous data phase from the eye diagram of the re-

ceived signal. To accumulate the eye diagram, the PD needs to keep track of the sampling

phases for every sample. In general, it is possible to calculate the sampling phases knowing

the sampling rate or, equivalently, the sampling interval — the time between two adjacent

samples in terms of UI. This calculation of the sampling phases, however, requires circuit

resources to perform the calculations, increasing the power and area of the PD. In order to


6 11109875432UI: 1

S1

S2

S3S16

UI: 1 – 11

16 samples

11 UIs≈ 1.45x

Figure 4.3: Eye diagram accumulation with fractional sampling rate.

reduce the power and area, instead of calculating the sampling phases, the proposed FSR

CDR restricts the sampling rates such that the sampling phases become periodic, which

causes a repetition of the sampling phases after a known number of UIs.

Figure 4.3 illustrates the proposed sampling at a fractional rate, and the accumulation of

the eye diagram for the phase detection. The received signal is sampled such that an integer

number of samples fall into an integer number of UIs. In the example of Figure 4.3, 16

samples fall into 11 UIs for the sampling rate of 16/11≈1.45x. Since the sampling interval,

SI, is 11/16 = 0.6875 UI long, the sampling phase changes from one UI to another. In fact,

the sampling phase sweeps the UI, and folding the samples into a single UI reveals the eye

diagram of the received signal, as shown at the bottom of Figure 4.3.

The sampling rate of 16/11 causes the sampling phase to repeat every 16 samples or,

equivalently, every 11 UIs. In fact, the sampling phase for every sample can be calculated

using a simple relation:

Sampling Phase, UI = mod16 [11 · (Sample Number−1)] . (4.1)

Table 4.1 lists the sampling phases for the 16 samples that follow from (4.1). The sampling

phases are expressed with respect to the first sample, S1, whose phase is 0. Since the frac-

tion 16/11 cannot be simplified, all 16 samples in Table 4.1 have unique sampling phases

that span the entire UI. The sampling phase repetition period of 16 samples matches the

number of DeMUX channels in the receiver (see Figure 4.2), and therefore every DeMUX

channel has a constant sampling phase, or time stamp, associated with it. This correspon-


Table 4.1: Sampling phases for the sampling rate of 16/

11 ≈ 1.45x.

Sample Phase, UI Sample Phase, UI1 0

/16 9 8

/16

2 11/

16 10 3/

163 6

/16 11 14

/16

4 1/

16 12 9/

165 12

/16 13 4

/16

6 7/

16 14 15/

167 2

/16 15 10

/16

8 13/

16 16 5/

16

Figure 4.4: Phase detection from the eye diagram.

dence between the DeMUX channels and the sampling phases allows accumulating the eye

diagram with no phase calculations. In fact, every DeMUX channel maps to a vertical slice

of the eye diagram at a constant position, and hence the eye diagram can be constructed by

routing the samples from every DeMUX channel to its corresponding eye diagram slice.

To estimate the instantaneous data phase from the accumulated eye diagram, the PD

uses fictitious transitions extracted from the eye diagram. Figure 4.4 illustrates the process

of forming these fictitious transitions through four sample transitions. First, the PD divides

the samples into those belonging to ‘1–0’ transitions, and those belonging to ‘0–1’ transi-

tions. Then, the PD joins all the positive samples with all the negative samples belonging to

the same branch, thus forming the fictitious transitions from the samples accumulated in the

eye diagram. Figure 4.4 shows two examples of ‘1–0’ transitions and two examples of ‘0–


Average-transition-slope calculator

Data-phasecalculator

16

φX[1:16]

(|Si| + |Si+1|)AVG16

1

Digital Samples

S[1:16]

Transition Flag [1:16]

Figure 4.5: Simplified block-diagram of the transition-based phase detector.

1’ transitions, while the total number of fictitious transitions is larger. To assure that these

transitions approximate the actual zero crossing in the eye diagram, the samples forming

these fictitious transitions are restricted to be at most 0.5 UI apart in the eye diagram. Next,

the PD linearly estimates the zero-crossing phase for all the fictitious transitions with a 2-bit

accuracy using the method presented in Chapter 3. Finally, the PD averages the resulting

phase values to output a single instantaneous phase value for the frame of 16 samples.

Behavioral simulations confirm that the proposed eye-based phase detection scheme

leads to a successful phase recovery in an FSR CDR. The implementation of the eye-

based PD, however, requires a large number of phase calculators since the number of fic-

titious transitions exceeds the number of actual transitions and these fictitious transitions

are spread across the eye diagram. This high complexity of the eye-based phase detection

scheme motivates the low-complexity transition-based phase detection scheme proposed in

the following section.

4.2.2 Transition-Based Phase Detector

The transition-based phase detection scheme estimates the instantaneous data phase from

the actual, rather than fictitious, transitions in the received signal. The scheme then adjusts

the estimated phase values such that they are expressed in terms of UIs with respect to a

common reference.

Figure 4.5 presents a simplified block-diagram of the transition-based PD, which con-

sists of an average-transition-slope calculator and a data-phase calculator. The PD takes 16

digital samples, S, as the input and generates 16 instantaneous data phase values, φX , as the

output. With the fractional sampling rate of 1.45x, the 16 samples correspond to 11 UIs,

and there are fewer than 16 transitions among the 16 samples. To mark the actual, or valid,

transition, the PD generates 16 transition flags. The transition flags corresponding to the

valid transitions are set to ‘1’, while the flags corresponding to no transitions are set to


Si

Si+1

Si+2

(a) Case 1: compare Si with Si+2

VTH

Si

Si+1

-VTH

Si+2

Si+3

Si+4

(b) Case 2: compare samples with VT H

Figure 4.6: Selection of transitions leading to low-error phase detection.

‘0’. The data-phase calculator linearly estimates φX for every pair of adjacent samples with

opposite polarities using the slopes of the transitions between the samples. However, due

to the FSR, the time between two adjacent samples exceeds 0.5 UI, and the sampling phase

changes from one UI to another. As a consequence, some transition slopes lead more accu-

rate estimates of φX , while other slopes lead to less accurate estimates of φX . To improve

the phase detection accuracy, the average-transition-slope calculator recovers the average

value of the slopes leading to the more accurate estimates of φX . The data-phase calcula-

tor then uses the recovered average slope to improve the accuracy of the phase estimation.

The remainder of this section first presents the average-transition-slope calculator and then

describes the data-phase calculator.

Figure 4.6 illustrates through two cases the distinction of the transition slopes into two

categories: those leading to more accurate phase detection (shown as solid bold lines),

and those leading to less accurate phase detection (shown as dashed lines). The average-

transition-slope calculator makes the distinction between the slopes using the amplitudes of

the samples. In the first case, shown in Figure 4.6(a), two transitions are adjacent to sample

Si+1. Since the slope of the phase-estimation line cannot exceed the actual transition slope,

the transition with the larger slope leads to a more accurate phase estimation (solid bold


SumFlag1

SumFlag2

SumFlag16

A ( |Si|+|Si+1| )AVG

|S1|+|S2|

|S2|+|S3|

|S16|+|S17|

Figure 4.7: Average-slope-recovery filter.

line), while the other transition leads to a lower accuracy. To select the larger of the two

slopes, it is sufficient to compare the amplitude of Si with the amplitude of Si+2. In the

second case, shown in Figure 4.6(b), this comparison of two adjacent slopes is not possible

since the two slopes are several samples apart from each other. In this case, the average

slope calculator compares the samples’ magnitudes with a threshold level, VT H , which is

also recovered from the samples’ magnitudes. A transition leads to a more accurate phase

estimation when both its samples exceed the threshold in magnitude (solid bold line). In

contrast, when one of the samples has a magnitude below the threshold level, the transi-

tion is likely to lead to a less accurate phase estimation (dashed line). The average-slope

calculator then recovers the average value of the slopes shown with the solid bold lines in

Figure 4.6.

Figure 4.7 demonstrates the implementation of the average-slope-recovery filter. Since

the time between two adjacent samples is constant (it equals one sampling interval), the

sum of magnitudes of two samples carries the same information as the actual slope of the

line between these two samples. Therefore, the filter takes the sums of magnitudes of the

samples at the input, |Si|+ |Si+1|, along with the binary sum flags, SumFlagi, that mark

the transitions contributing to the average. The filter outputs the average value of the sums,

(|Si|+ |Si+1|)AV G. The filter is of feedback type, and it resembles a simple phase-recovery

loop. First, the subtracters calculate the errors between the instantaneous sums and the

average sum. The binary sum flags then multiply the error values such that only the errors

with their flags set to ‘1’ contribute to the average. These multipliers are implemented as

bit-wise AND between the flags and error values. Next, the sum errors are combined into

a single value that is scaled by gain A. Finally, a discrete-time integrator, consisting of an


Si

Si+1

Si+2

Average low-phase-error slope

Figure 4.8: Reduction of phase-detection error using average transition slope.

5

-2

0

Si+1

Si

φZC i

t

00 01 10 11

SI = 11/16 UI

X

Figure 4.9: Linear estimation of instantaneous zero-crossing phase, φZC.

adder and a register, uses the error values to recover the average sum, (|Si|+ |Si+1|)AV G,

thus closing the feedback loop. Similar to the sample values, the recovered average slope is

a 5-bit value. The data-phase calculator then uses this average transition slope to improve

the phase-detection accuracy according to the scheme presented next.

Figure 4.8 illustrates a method of improving the phase-estimation accuracy for the high-

error transitions. For this type of transitions, shown as a dashed line in the figure, instead of

the actual slope, the phase calculator uses the average transition slope, shown as a solid line.

This phase-estimation line is defined by the average slope and one sample that is closed to

the zero-crossing in magnitude, Si+2 in the example of Figure 4.8.

The data-phase calculator uses either the actual or the average transition slope to define

the line for the phase estimation. The linear phase-estimation scheme, shown in Figure 4.9,

is similar in principle to the scheme used in the 2x feed-forward CDR architecture presented

in Chapter 3. Figure 4.9 illustrates an example of linear phase detection using a line defined

by two samples, Si and Si+1. The PD calculates the phase of the zero-crossing point, X ,


TSi + SI·1/8

TSi + SI·3/8

TSi + SI·5/8

TSi + SI·7/8

00

01

10

11

φX i, UI

φZC i, SI

Figure 4.10: Selector converting phase values from sampling intervals to unit intervals.

using:

φZC i =|Si|

|Si|+ |Si+1| . (4.2)

Since the time between two adjacent samples is 1 SI = 0.6875 UI, φZC i expresses the data

phase as a proportion of SI with respect to sample Si. The changing sampling phase due

to the FSR makes φZC i from different sampling intervals inconsistent with each other. In

order to use the instantaneous phase values, φZC i needs to be expressed in terms of UIs with

respect to a common reference. To obtain this converted instantaneous phase, φX i, φZC i is

scaled by SI and offset by the sampling phase, or the time stamp, T Si, of every sample:

φX i = T Si +SI ·φZC i, (4.3)

Since the sampling phases are defined with respect to the first sample for every 16 samples

(see Table 4.1), φX i is referenced to the same first sample. A modulo-1 addition in (4.3)

confines φX i to the range [0, 1) UI.

To maintain a low circuits complexity of PD, φZC i is calculated with a 2-bit accuracy,

which allows to replace the division operation with few additions and subtractions (see

Figure 3.5). Note that replacing the instantaneous sum in the denominator with the average

sum (slope) allows to use the same phase calculator to estimate the data phase for the high-

phase-error transitions that require the slope substitution.

The conversion of the phase values from SI to UI in (4.3) requires one multiplier and

one adder per sample. To reduce the complexity of the conversion operation, the PD makes

use of the low accuracy of φZC i, which is a 2-bit value. Since φZC i takes one of four

values, its corresponding φX i also has four possible values. Hence, instead of calculating

φX i, the PD uses a 4-to-1 selector to performs the conversion as shown in Figure 4.10. The


16φAVGφX

φERR K1z-1

1 – z-1

K2z-1

1 – z-1

K3z-1

1 – z-1

phase subtracter

16

3rd order low-pass filter

Transition Flag

Figure 4.11: Phase recovery filter.

selector takes φZC i as the control input and picks one of four values as the output φX i. The

four possible values of φX i are constants and they are pre-calculated in advance. Once the

sampling rate is selected for the receiver, the sampling phases, T Si, are constant for every

sample (Table 4.1 lists the sampling phases for sampling at 1.45x). The sampling interval,

SI, is also a constant. The four possible values for φZC i are 1/8, 3/8, 5/8 and 7/8, which

correspond to the middles of the four quantization bins shown at the bottom of Figure 4.9.

Since SI = 0.6875 UI and the linear phase estimation is 2-bit accurate, the effective PD

resolution is 2.54 bits/UI.

With the instantaneous phase values, φX , expressed in UI with a common reference for

all sampling intervals, the CDR recovers the average data phase using a filter presented in


4.3 Phase-Recovery Filter

The filter averages the instantaneous phase, φX , into φAV G that tracks the data phase in the

average sense. Figure 4.11 shows a simplified diagram of the phase-recovery filter used in

the proposed FSR CDR. The filter consists of a phase subtracter and a discrete-time low-

pass filter in a feedback configuration. The filter topology is similar to the topology used in

the 2x feed-forward CDR architecture of Chapter 3. The filter is presented in Section 3.3,

while this section highlights the filter details specific to the FSR CDR.

Due to sampling at 1.45x, among the 16 samples, there are at most 11 valid zero cross-

ings. The binary transition flags corresponding to these valid transitions are set to ‘1’. To

assure that only the actual crossings contribute to the recovery of φAV G, the phase subtracter

uses the transition flags to pass only the valid phase error values, φERR, to the low-pass filter.

4.4. Data-Decision Scheme 70�� Figure 4.12: Phase subtracter.

Figure 4.12 illustrates the operation of the phase subtracter in the FSR CDR. First,

the phase subtracter calculates the difference between the instantaneous and average data

phases, φX i and φAV G. Then the phase differences are multiplied by their corresponding

binary transition flags such that only the valid transitions contribute to the average phase

recovery. In the binary domain, this multiplication by the transition flag is implemented as a

bit-wise AND operation of φX i and its corresponding flag. Next, modulo blocks confine the

values of the phase errors, φERRi to the range of [-0.5, 0.5) UI. Finally, the 16 phase errors,

φERRi, are combined to output a single phase error, φERR, for the frame of 16 samples.

The low-pass filter averages φERR to calculate φAV G, as shown in Figure 4.11. The

instantaneous and average phase values allow the CDR to recover the data bits from the

samples of the received signal according to the scheme presented in the next section.

4.4 Data-Decision Scheme

The data-decision scheme in the FSR CDR first detects the number of samples in every

UI, and then for the UIs with two samples the scheme picks one sample as the decision

bit while marking the other sample as the duplicate. Due to the fractional sampling rate

the number of samples exceeds the number of UIs: some UIs are sampled twice, while

others once. With the blind sampling, it is unknown a priori which UIs are sampled twice,

and therefore the decision block needs to find the UIs with duplicate samples. For the UIs

sampled once, the decision is trivial. In contrast, for the UIs sampled twice, the decision

block picks the sample that is closer to the UI center as the decision. The remainder of

4.4. Data-Decision Scheme 71�� (a) One sample per UI��

(b) Two samples per UI

Figure 4.13: Detecting number of samples per UI (jitter-free case).

this section first presents the method of detecting the number of samples per UI, and then

describes the selection of the decision bit for the UIs with two samples.

Figure 4.13 illustrates the detection of the number of samples per UI through a jitter-

free example. For this detection, the decision scheme relies on φAV G, which coincides

with φX in the jitter-free case. The scheme counts the number of samples within a 1-UI

window from φAV G using the values of the sampling phases for every sample. The figure

highlights this 1-UI window with a grey outline, and it shows the UI of interest as a shaded

UI. When a UI is sampled once, as shown in Figure 4.13(a), the decision scheme takes the

sign of the sample as the decision bit. In contrast, when a UI is sampled twice, as shown in

Figure 4.13(b), the decision scheme picks one of the samples as the decision and marks the

other sample as a duplicate according to the algorithm described next.

Figure 4.14 presents the data decision scheme for the case of two samples per UI. The

goal of the decision block is to pick the sample that is closer to the center of the UI. To

find the UI center, the average UI-center phase, φPICK , is calculated by adding 0.5 UI to

the average zero-crossing phase, φAV G, using modulo-1 addition. In the jitter-free case of

Figure 4.14(a), the two samples adjacent to φPICK , samples A and B, have identical signs and

therefore either sample can be selected as the decision, while the other sample is marked as

a duplicate. Jitter causes the instantaneous phase to deviate from the average phase, and in

fact may cause a transition between the two samples that nominally belong to the same UI,

as shown in Figures 4.14(b) and 4.14(c). In this case, the decision block picks the sample

4.4. Data-Decision Scheme 72�� (a) Jitter-free case��

(b) Jitter example 1�� (c) Jitter example 2

Figure 4.14: Data decision in the presence of jitter.

belonging to the same shaded UI to which φPICK points. The data-picking then reduces to

a comparison between φPICK and φX . If φX is larger than φPICK , as shown in Figure 4.14(b),

then the scheme chooses the sign of A as the decision bit. Conversely, if φX is smaller than

φPICK , as shown in Figure 4.14(c), the scheme chooses the sign of B. The remaining sample

in both cases is marked as a duplicate for the subsequent removal from the data vector.

The proposed data-decision scheme recovers error-free data as long as the cycle-to-

cycle jitter remains within the scheme’s limit, which depends on the sampling rate. The

worst case jitter conditions occur when two samples fall in a UI and the samples are cen-

tered in the UI, i.e., they are equidistant from the UI edges (see Figure 4.14(a)). In the

worst case, a cycle-to-cycle jitter of (1UI−1SI) causes the UI edges to move towards each

other such that none of the samples fall in the UI. In the example of sampling at 1.45x, this

maximum tolerance is (1UI−0.6875UI) = 0.3125UIPP. This estimated tolerance to the

cycle-to-cycle jitter is confirmed through simulations in Section 4.7.

4.5. Data Compaction Schemes 73

To assure that the recovered data contains only the valid data bits, the CDR removes

the duplicates marked by the decision block using a data compactor, which is presented in


4.5 Data Compaction Schemes

In contrast with an integer sampling rate, the fractional sampling rate leads to a variable

number of samples per UI, causing duplicate samples in some UIs. Since the location of

the duplicates among the recovered data bits is unknown a priory, the CDR requires a

mechanism for removing the duplicates in order to avoid data recovery errors. Two data

compaction schemes are proposed here for the FSR CDR: a shift-register scheme in Sec-

tion 4.5.1 and a selector-array scheme in Section 4.5.2.

4.5.1 Shift-Register Data Compactor

Figure 4.15 presents a simplified diagram and a signal-flow graph of a sample 5-to-3 shift-

register data compactor. The compactor takes 5 sliced samples, Si, along with their valid

flags, V Fi, at the input on the left, removes two duplicate samples, S2 and S4, and outputs

on the right 3 data bits free of duplicates, Di, as shown in Figure 4.15(a). The compactor

consists of three registers and two sets of 2-to-1 selectors between them. As the data bits

and their flags are shifted from left to right, the selectors eliminate the duplicates (shaded

in the figure) one at a time. The valid flags, V Fi, are set to ‘1’ for the valid bits and to ‘0’ for

the duplicates. These flags guide the conditional selectors to remove the duplicates from

the data vector. The bold lines in the diagram highlight the path of the valid data bits.

Behavioral simulations confirm that the shift-register data compactor removes the du-

plicate samples leading to an error-free data recovery. This method of duplicate removal,

however, comes at the cost of large latency and multiple registers. In fact, the signal-flow

graph in Figure 4.15(b) shows that the number of latency cycles in this compactor equals

the number of duplicates to be removed. With sampling at 1.45x, among the 16 samples,

there are 11 valid bits and 5 duplicates, which requires 5 cycles. Every cycle in the pro-

posed CDR corresponds to 11 UIs, leading to the effective latency of 55 UIs. Every latency

cycle corresponds to a register stage in the data compactor. With 5 duplicates to remove,

the compactor requires at least 5 registers, with each register containing between 11 and 16


VF1

VF2

VF3

VF4

VF5

D1

D2

D3

1

1

x

0

0

1

x

0

1

1

1

1

0

1

x

0

1

1

1

1

0

1

1

1

S1

S2

S3

S4

S5

Stage 1 Stage 2

(a) Simplified block-diagram

Stage 1 Stage 2 3 bits total: 3 valid 0 duplicates

5 bits total: 3 valid 2 duplicates

4 bits total: 3 valid 1 duplicate

Data Compactor

(b) Signal-flow graph

Figure 4.15: Shift-register data compactor.

positions for the data bit and its flag. This large number of registers causes high power con-

sumption and large area in the shift-register compactor. In an attempt to reduce the power,

area and latency of the data compaction, the following section proposes an alternative data

compaction scheme.

4.5.2 Selector-Array Data Compactor

Figure 4.16 presents a simplified diagram of the selector-array data compactor. The com-

pactor takes the inputs on the left and it outputs the duplicate-free data vector at the bottom.

To route valid bits from the input to output, the compactor consists of an array of condi-

tional selectors, which pass the bits in one of two directions: either from the left to bottom,


VF1

VF2

VF3

VF4

VF16

D1 D3D2 D11

0

1

x

0

0

1

1

1

1 0 0

1

1

1

S1

S2

S3

S4

S16

01

DTIN TOP ENIN TOP

ENOUT RIGHT

DTIN LEFT

ENIN LEFT

DTOUT BOT ENOUT BOT

Figure 4.16: Shift-register data compactor.

or from the top to bottom. The valid flags, V Fi, control the direction of passing the data

bits such that the duplicates (shaded in the figure) are eliminated. In the diagram, the bold

lines highlight the paths of the valid bits.

The inset in Figure 4.16 depicts the circuit diagram of a conditional selector cell used

in the vector compactor. The cell consists of a data selector and three logic gates. The

selector passes either DTIN TOP or DTIN LEFT to DTOUT BOT according to the values of the

enable flags ENIN TOP and ENIN LEFT . The cell also updates the enable flags, ENOUT RIGHT

and ENOUT BOT , for its surrounding cells according to the truth table shown in Table 4.2.

The cell passes the data from the left to bottom only when both ENIN TOP and ENIN LEFT

are set to ‘1’, which also makes the cells in the remainders of the row and column to pass

the data only from the top to bottom. As a result, the bits with their V Fi set to ‘0’ never

propagate to the output. With this method of data compaction, it takes one cycle to remove

the duplicates regardless of the number of duplicates.

Nominally, the array consists of 16 rows and 11 columns for the total of 176 cells.

An observation that the first few inputs must correspond to the first few outputs allows to

eliminate some cells at the top right part of the array. In a similar manner, some of the cells

at the bottom left part of the array can be replaced with wires, since these cells always pass

the data bits from the top to bottom. Following this reasoning, the compactor is reduced to


Table 4.2: Conditional selector truth table.

ENIN TOP ENIN LEFT ENOUT BOT ENOUT RIGHT DTOUT BOT

0 0 0 0 DTIN TOP

0 1 0 1 DTIN TOP

1 0 1 0 DTIN TOP

1 1 0 0 DTIN LEFT

Write

Port

Read

Port

10...12

bits

16

bits

sampling rate

16625 MHz =

data rate

16≈ 430 MHz

Figure 4.17: Simplified FIFO diagram.

33 cells located close to the diagonal of the array. In addition to reducing the data compactor

power and area, this small number of the cells allows to remove the duplicates in a single

cycle, thus reducing the compactor latency to 11 UIs, which is 5x smaller compared to the

shift-register data compactor latency.

The compacted data vector, as most of the digital CDR, is in the sampling clock domain,

which has a fractional rate with respect to the baud rate. To simplify the verification of

the proposed CDR architecture, the recovered data is retimed according to the scheme

presented next.

4.6 Data Retiming Scheme

The data retiming scheme in the FSR CDR is similar to the retiming scheme in the 2x feed-

forward CDR architecture presented in Section 3.5. This section only highlights the aspects

of the retiming scheme that are specific to the FSR CDR architecture. Figure 4.17 shows a

simplified digram of the FIFO that retimes the data bits from the fractional sampling clock


domain to the integer baud-rate clock domain. The FIFO is a circular register with a write

port and a read port. The write port (shaded in the diagram) places 10 to 12 data bits into

the FIFO and it is synchronized to the divided sampling clock running at 625 MHz. The

variable number of bits at the write port allows the FIFO to compensate for the frequency

offset between the transmitter and receiver, which is inevitable in blind-sampling receivers.

The read port is triggered by a divided baud-rate clock at approximately 430 MHz to remove

16 data bits at a time from the FIFO. In contrast with the vector compactor that removes

the duplicate samples, the FIFO only re-arranges the data for the purpose of retiming it to

a convenient clock domain.

The simulation and measurement results of a receiver with the FSR CDR, presented in

the following section, confirm that the FIFO successfully retimes the data from a fractional

to an integer rate clock domain, and it compensates for the frequency offsets between the

transmitter and receiver.

4.7 Simulation and Measurement Results

In order to validate the proposed FSR CDR architecture, the CDR was first simulated on

behavioral level with various sampling rates. Then, a receiver test-chip with a 1.45x FSR

CDR, shown in Figure 4.2, was fabricated and characterized in 65 nm CMOS. The FSR

CDR uses the transition-based phase detector and the selector-array vector compactor.

An event-driven approach [48] was used to build a behavioral model of the FSR CDR

in Simulink. To explore the effect of the sampling rate on the jitter tolerance, the CDR

was simulated with four different sampling rates. Figure 4.18 presents the simulated jitter

tolerance of the FSR CDR with the sampling rates annotated in the legend. The sampling

rates were chosen such that they are a ratio of 16 samples, which is the number of DeMUX

channels (see Figure 4.2), and an integer number of UIs ranging from 9 to 15. In these

simulations, the sampling rate is 10 GS/s while the baud rate is adjusted to achieve the

desired sampling rates. The simulations were performed with a 231− 1 PRBS sequence

and BER ≤ 5 ·10−6.

The low-frequency jitter tolerance is weakly-affected by the sampling rate. However,

the high-frequency jitter tolerance shows a higher dependence on the sampling rate. As

the sampling rate increases from 1.07x, the high-frequency jitter tolerance improves. With


103

0.1

1

10

100

1000


Jitter Amplitude, UI PP

104 105 106 107 1080.01

109 1010

Sampling Rate

16/15 ≈ 1.07

16/13 ≈ 1.23

16/11 ≈ 1.45

16/9 ≈ 1.78

Figure 4.18: Simulated jitter tolerance (BER ≤ 5 ·10−6).

the further increase of the sampling rate, the improvements diminish. This diminishing

improvement in the jitter tolerance guided the choice of the 1.45x sampling rate for the

fabricated receiver test-chip. The variation of the high-frequency jitter tolerance with the

sampling rate is consistent with the sample energy per UI. As the sampling rate approaches

1x, with blind sampling the worst-case sample energy approaches zero when the samples

occur close to the UI boundaries. This worst-case sample energy per UI increases with the

increasing sampling rate.

The simulations also confirm the expected tolerance to cycle-to-cycle jitter. With sam-

pling at 1.45x, the time between two adjacent samples is 0.6875 UI, and the expected jitter

tolerance at half the data rate is 0.3125 UIPP (see Section 2.3.2 for details). In the sim-

ulations, the jitter tolerance at 2 GHz reduces to approximately 0.3 UIPP, which is close

to the expected jitter tolerance. The high-frequency jitter tolerance is 0.65 UIPP, which is

below the expected 1 UIPP due to the linear nature of the phase estimation in the proposed

CDR. The behavioral simulations indicate that the proposed CDR architecture is functional,

which provides grounds for the experimental verification of the FSR CDR. The design flow

of the proposed FSR CDR is the same as the flow illustrated in Figure 3.15 and outlined in

Section 3.6.

Figure 4.19 illustrates the die photo of the test-chip receiver that implements the 1.45x

FSR CDR in 65 nm standard-logic CMOS. In the receiver, the input buffers, ADC, clock


1900µm

Output B

uffe

rs

4 chan.2.5GS/sADCs

400x490 µm2

4:16 DeMUX60x490µm2

Input Buffers50x60 µm2

CDR430x270µm2

SynthesizedLogic

TestStruct.

Bias Gen. &Clock Div.170x140 µm2

Figure 4.19: Test-chip die photograph.

Table 4.3: Test-chip parameters.

Process 65 nm CMOSData rate 6.875 Gb/s

Sampling rate 10 GS/sSupply 1.2 VPower 175.2 mW

Receiver area 0.37 mm2

divider and the DeMUX are custom blocks, while the digital CDR and the test-structures

are all synthesized. The test-chip has an integrated PRBS comparator for the receiver verifi-

cation. Table 4.3 summarizes the test-chip parameters. The receiver samples the 6.875 Gb/s

signal at 10 GS/s for the sampling rate of 1.45x. It consumes 175.2 mW of power from a

1.2 V supply, and it occupies 0.37 mm2 of die area.

To verify the operation of the ADC and DeMUX, Figure 4.20 presents an eye diagram

that was reconstructed from the samples measured at the DeMUX output (see Figure 4.2).

To reconstruct this eye, a 6.875 Gb/s 27 − 1 PRBS sequence was applied at the receiver

input, and then 0.5 million DeMUXed samples were captured with a logic analyzer. Since

every DeMUX channel, i, has a constant sampling phase, or time stamp, T Si, associated

with it, the DeMUX channels were arranged along the horizontal axis in the ascending

order of their sampling phases. The figure also annotates the DeMUX channel numbers


1 4 7 10 13 16 3 6 9 12 15 2 5 8 11 14

0/16 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 9/16 10/16 11/16 12/16 13/16 14/16 15/16

ADC Sample Value

2

30

26

22

18

14

10

6

φXTime, UI

0 0.2 0.4 0.6 0.8 1

DeMUX Channel, i

Time Stamp, TSi

Figure 4.20: Measured eye diagram at the demux output.

and their corresponding time stamps under the eye diagram. The sampling rate of 16/11

leads to the total of 16 unique sampling phases, which quantizes the eye diagram to 16 bins

in the horizontal direction. The 5-bit ADC resolution quantizes the diagram to 32 bins in

the vertical direction. The open eye at the ADC output indicates that both the ADC and

the DeMUX are functional, and that the error-free data recovery is possible with the FSR

CDR.

Figure 4.21 shows the measured jitter tolerance of the fabricated receiver test-chip. This

jitter tolerance was measured with a 6.875 Gb/s 27−1 PRBS input and the sampling rate of

10 GS/s. The blind sampling clock has 760 ps random jitter (RMS) and -128 dBc/Hz phase

noise at 1 MHz offset. The jitter tolerance was recorded at BER ≤ 10−12. For a conve-

nient comparison between the measured and simulated results, Figure 4.21 also illustrates

a simulated jitter tolerance with the same input sequence. To maintain a reasonable simula-

tion time, the jitter tolerance was recorded at BER ≤ 5 ·10−6. A close match between the

simulated and measured jitter tolerances experimentally confirms the functionality of the

proposed FSR CDR architecture.

The measurements show that the FSR CDR tolerates up to 0.98 % (9800 ppm) of the

4.8. Summary 81

1030.1

1

10

100

1000


Jitter Amplitude, UI PP

104 105 106 107 108

Simulated, BER ≤ 5·10–6

Measured, BER ≤ 10–12

Figure 4.21: Measured jitter tolerance.

frequency offset between the transmitter and receiver with BER≤ 10−12. In these measure-

ments, the sampling clock frequency was shifted from the nominal 5 GHz to 4.951 GHz

and 5.049 GHz while the data rate remained at the nominal value of 6.875 Gb/s.

By reducing the sampling rate from 2x to 1.45x, the proposed FSR CDR reduces the

ADC power per Gb/s of data rate by 27.3 % compared to a 2x ADC-based receiver. This

reduction of the ADC power comes at the cost of doubling the gate count in the digital

CDR in comparison with the 2x feed-forward CDR presented in Chapter 3; however, the

power per Gb/s of data rate and the total receiver area are reduced by 12.5 %.

4.8 Summary

This chapter presented the blind-sampling fractional-sampling-rate ADC-based CDR ar-

chitecture. This architecture reduces the ADC sampling rate from 2x to a fractional rate

between 2x and 1x thus saving the ADC power and area per Gb/s of data rate. The dig-

ital CDR then recovers the data from the fractionally-spaced digital samples of the sig-

nal using a feed-forward topology similar to that proposed in Chapter 3. The FSR CDR

accommodates the fractional sampling rate using the phase-detection, data-decision and

vector-compaction schemes presented in this chapter.

The proposed CDR was implemented in a receiver test-chip that samples a 6.875 Gb/s

4.8. Summary 82

signal at 10 GS/s for the sampling rate of 1.45x. The CDR then successfully recovers

the data in the digital domain, which is confirmed through the measured jitter tolerance.

The receiver test-chip consumes 175.2 mW from a 1.2 V supply and occupies the area of

0.37 mm2. Sampling at 1.45x reduces the ADC area and power per Gb/s of data rate by

27.3 % compared to sampling at 2x. The simulation and measurement results show that the

proposed FSR CDR architecture is functional and it is applicable for the high-speed serial

interconnects.

Chapter 5

Conclusions

THIS THESIS has explored the blind-sampling ADC-based receivers for high-speed

signaling applications. As a result of this exploration, the thesis has proposed two

new CDR architectures for the ADC-based receivers.

First, the proposed feed-forward CDR architecture recovers the phase and data directly

from the blind digital samples of the received signal in a feed-forward manner, eliminating

the need for an interpolating feedback loop used in the previously reported blind ADC-

based CDRs. The feed-forward topology reduces the CDR’s circuit complexity, making

this architecture suitable for high-speed interconnects. To experimentally validate the pro-

posed architecture, a 5 Gb/s 2x ADC-based receiver with the feed-forward CDR was imple-

mented in 65 nm CMOS. The measurements of the receiver test-chip show that the CDR

successfully recovers the data, which validates the proposed architecture.

Second, to reduce the ADC power and area, the proposed FSR CDR architecture re-

duces the sampling rate from an integer rate of 2x to a fractional rate between 2x and 1x

the baud rate. The CDR then relies on the feed-forward topology to recover the phase and

data from the fractionally-spaced samples of the receiver signal. The feed-forward topol-

ogy enables the FSR CDR to maintain a sufficiently low circuit complexity which leads to

the overall receiver power and area savings. To verify the proposed FSR CDR architecture,

a 1.45x ADC-based receiver was implemented in 65 nm CMOS. The receiver successfully

recovers 6.875 Gb/s data from the samples taken at 10 GS/s. Reducing the sampling rate to

1.45x reduces the ADC power and area per Gb/s of data rate by 27.3 % compared to the 2x

receiver with the feed-forward CDR, while the overall receiver power and area reduce by

83

5.1. Thesis Contributions 84

12.5 %. These measurement results confirm that the FSR CDR architecture is applicable

to the high-speed interconnects, and that it reduces the area and power compared to the 2x

CDR architecture.

5.1 Thesis Contributions

The contributions of this thesis are two new CDR architectures for the high-speed blind-

sampling ADC-based receivers. These architectures are:

• A 2x feed-forward CDR architecture that reduces the circuit complexity of the blind-

sampling ADC-based receivers, making this type of receivers suitable for high-speed

interconnects. The architecture was implemented and characterized in a 5 Gb/s re-

ceiver test-chip. This work was accepted for publication in IEEE Journal of Solid-

State Circuits (JSSC), to appear in June 2010 issue [17]. This architecture was also

used in an ADC-based receiver presented at IEEE International Solid-State Circuits

Conference (ISSCC), 2010 [18].

• A fractional-sampling rate (FSR) CDR architecture that reduces the sampling rate

below 2x in order to save the ADC power and area in the blind-sampling ADC-

based receivers. The architecture was implemented and characterized in a 6.875 Gb/s

1.45x receiver test-chip. This work was presented at IEEE International Solid-State

Circuits Conference (ISSCC), 2010 [19].

5.2 Future Directions

The contributions of this thesis have shown that the clock and recovery are possible in

the blind-sampling ADC-based receivers at high data rates. However, to make the ADC-

based receivers competitive with the binary-sampling phase-tracking receivers in terms of

power efficiency and resilience to limited channel bandwidth, further work is required. This

section outlines potential future directions towards making the blind-sampling ADC-based

receivers feasible for practical high-speed interconnects.

In the ADC-based receivers, the ADC typically consumes a significant portion of the

receiver power. In the examples of the receiver test-chips presented in this thesis, the ADC

consumes approximately 2/3, while the digital CDR consumes about 1/3 of the power. This

5.2. Future Directions 85

high ADC power consumption makes it challenging to use the ADC-based receivers in the

low-power interconnects. Typically, the high-speed receivers use flash ADCs for their low-

latency conversion. One way to reduce the receiver power is to replace the low-latency high-

power flash ADCs with high-latency low-power successive approximation (SAR) ADCs.

In contrast with the feedback topologies, the feed-forward topology of the proposed CDR

architectures is insensitive to the ADC latency, which makes SAR ADCs feasible for the

blind-sampling ADC-based receivers. Further work is required, however, to assure that the

SAR ADCs for the high-speed signaling applications have a sufficiently small sampling

window compared to the bit-interval, and a sufficiently high conversion rate.

One of the primary advantages of the ADC-based receivers is the digital representation

of the signal samples, which allows for the signal equalization in the digital domain after

sampling. In fact, an adaptive feed-forward equalizer (FFE) for the 2x feed-forward CDR

is presented in [18]. This linear FFE, however, enhances the quantization noise of the ADC.

To mitigate the noise enhancement, a 1-tap speculative decision-feedback equalizer (DFE)

with a programmable tap weight is presented in [47]. To improve the resilience of the blind-

sampling ADC-based receivers to the limited channel bandwidth, it is desirable to extend

the 1-tap DFE to a multi-tap DFE as well as to develop an adaptation algorithm suitable for

this DFE scheme. A combination of an adaptive FFE with an adaptive DFE would allow

the ADC-based receivers to operate in the presence of high channel attenuation, which is

required by the high-speed signaling standards.

The equalization schemes presented in the recent publications focus on blind-sampling

ADC-based receivers with an integer sampling rate of 2x [18, 47]. However, to date, no

equalization schemes are reported for the fractional sampling rates. Extending the FFE

and DFE schemes to the fractional sampling rates would make the FSR CDR architecture

suitable for practical high-speed interconnects.

Through simulations, this thesis has shown that the FSR CDR architecture is able to

recover the data with various sampling rates. Reducing the sampling rate allows to reduce

the ADC power while maintaining the data rate. The reduction of the sampling rate, how-

ever, comes at the cost of reducing the high-frequency jitter tolerance. The jitter tolerance

is also related to the quality of the channel in an interconnect. Typically, the high channel

attenuation degrades the jitter tolerance. It would be desirable to make the sampling rate

adaptable to the channel quality in the FSR CDR architecture. This adaptability can use a

5.2. Future Directions 86

trade-off between the sampling rate and the jitter tolerance in order to save the ADC power

when the channel quality allows it. As an example, to maintain a desired jitter tolerance, a

low-attenuation channel allows for a lower sampling rate to reduce the ADC power, while

a high-attenuation channel requires a nominal sampling rate. The sampling rate adaptation

scheme might be similar to an equalizer adaption scheme since they both depend on the

amount of channel attenuation.

References

[1] C. Combs, Printed Circuits Handbook, 5th ed. McGraw Hill, New York, 2001.

[2] J. Bulzacchelli, M. Meghelli, S. Rylov, W. Rhee, A. Rylyakov, H. Ainspan, B. Parker,

M. Beakes, A. Chung, T. Beukema, P. Pepeljugoski, L. Shan, Y. Kwark, S. Gowda,

and D. Friedman, “A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS tech-

nology,” IEEE Journal of Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900, De-

cember 2006.

[3] H. Sugita, K. Sunaga, K. Yamaguchi, and M. Mizuno, “A 16Gb/s 1st-tap FFE and

3-tap DFE in 90nm CMOS,” in IEEE International Solid-State Circuits Conference

Technical Digest, vol. 53, February 2010, pp. 162–163.

[4] O. Agazzi, M. Hueda, D. Crivelli, H. Carrer, A. Nazemi, G. Luna, F. Ramos, R. Lopez,

C. Grace, B. Kobeissy, C. Abidin, M. Kazemi, M. Kargar, C. Marquez, S. Ramprasad,

F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson, T. Lindsay, and

P. Voois, “A 90 nm CMOS DSP MLSD transceiver with integrated AFE for electronic

dispersion compensation of multimode optical fibers at 10 Gb/s,” IEEE Journal of

Solid-State Circuits, vol. 43, no. 12, pp. 2939–2957, December 2008.

[5] H.-M. Bae, J. Ashbrook, J. Park, N. Shanbhag, A. Singer, and S. Chopra, “An MLSE

receiver for electronic dispersion compensation of OC-192 fiber links,” IEEE Journal

of Solid-State Circuits, vol. 41, no. 11, pp. 2541–2554, November 2006.

[6] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Col-

man, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Kil-

lips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson,

A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, “A 12.5Gb/s

87

References 88

SerDes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and

clock recovery,” in IEEE International Solid-State Circuits Conference Technical Di-

gest, February 2007, pp. 436–437, 591.

[7] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi,

B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, “A 500mW digitally calibrated

AFE in 65nm CMOS for 10Gb/s serial links over backplane and multimode fiber,” in

IEEE International Solid-State Circuits Conference Technical Digest, February 2009,

pp. 370–371.

[8] “Assembly and Packaging,” The International Technology Roadmap for

Semiconductors (ITRS), pp. 4–7, December 2007. [Online]. Available:

http://www.itrs.net/Links/2007ITRS/Home2007.htm

[9] “HDMI Specification Ver. 1.3a,” HDMI Licensing, LLC, Sunnyvale, CA,

USA, November 2006. [Online]. Available: http://www.hdmi.org/manufacturer/

specification.aspx

[10] “PCI Express Base 2.1 Specification,” PCI-SIG, Beaverton, OR, USA, March 2009.

[Online]. Available: http://www.pcisig.com/specifications/pciexpress/

[11] “Serial ATA Revision 3.0 Specification,” SATA-IO Administration, Beaverton,

OR, USA, June 2009. [Online]. Available: https://www.sata-io.org/developers/

purchase spec.asp

[12] “Universal Serial Bus Revision 3.0 Specification,” USB Implementers Forum, Inc.,

Beaverton, OR, USA, November 2008. [Online]. Available: http://www.usb.org/

developers/docs/

[13] J. Buckwalter, M. Meghelli, D. Friedman, and A. Hajimiri, “Phase and amplitude pre-

emphasis techniques for low-power serial links,” IEEE Journal of Solid-State Circuits,

vol. 41, no. 6, pp. 1391–1399, June 2006.

[14] B. Razavi, Design of Integrated Circuits for Optical Communications. McGraw Hill,

2003.

References 89

[15] I. Mehr and D. Dalton, “A 500-MSample/s, 6-bit Nyquist-rate ADC for disk-drive

read-channel applications,” IEEE Journal of Solid-State Circuits, vol. 34, no. 7, pp.

912–920, July 1999.

[16] B. Razavi, Phase-Locking in High-Performance Systems:From Devices to Architec-

tures. IEEE Press, 2003.

[17] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yamaguchi, and

J. Ogawa, “A 5-Gb/s ADC-based feedforward CDR in 65 nm CMOS,” IEEE Jour-

nal of Solid-State Circuits, vol. 45, no. 6, pp. 1091–1098, June 2010.

[18] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto,

K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito,

H. Ishida, and K. Gotoh, “A 5Gb/s transceiver with an ADC-based feedforward CDR

and CMA adaptive equalizer in 65nm CMOS,” in IEEE International Solid-State Cir-

cuits Conference Technical Digest, vol. 53, February 2010, pp. 168–169.

[19] O. Tyshchenko, A. Sheikholeslami, H. Tamura, Y. Tomita, H. Yamaguchi, M. Kibune,

and T. Yamamoto, “A fractional-sampling-sate ADC-based CDR with feedforward

architecture in 65nm CMOS,” in IEEE International Solid-State Circuits Conference


[20] M. Horowitz, K. Y. Chih-Kong, and S. Sidiropoulos, “High-speed electrical signaling:

overview and limitations,” IEEE Micro, vol. 18, no. 1, pp. 12–24, January/February

1998.

[21] J. R. Barry, E. A. Lee, and D. G. Messerscmitt, Digital Communication. Springer,

2004.

[22] E. Sackinger, Broadband Circuits for Optical Fiber Communication. John Wiley,

2005.

[23] Y. Hidaka, G. Weixin, T. Horie, H. J. Jian, Y. Koyanagi, and H. Osone, “A 4-channel

1.2510.3 Gb/s backplane transceiver macro with 35 dB equalizer and sign-based zero-

forcing adaptive control,” IEEE Journal of Solid-State Circuits, vol. 44, no. 12, pp.

3547–3559, December 2009.

References 90

[24] Y. Tomita, M. Kibune, J. Ogawa, W. Walker, H. Tamura, and T. Kuroda, “A 10-Gb/s

receiver with series equalizer and on-chip ISI monitor in 0.11-um CMOS,” IEEE Jour-

nal of Solid-State Circuits, vol. 40, no. 4, pp. 986–993, April 2005.

[25] S. Gondi, J. Lee, D. Takeuchi, and B. Razavi, “A 10Gb/s CMOS adaptive equalizer

for backplane applications,” in IEEE International Solid-State Circuits Conference


[26] A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan, “A 1.0625 Gbps transceiver

with 2x-oversampling and transmit signal pre-emphasis,” in IEEE International Solid-

State Circuits Conference Technical Digest, vol. 43, February 1997, pp. 238–239.

[27] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T. Lee, “A 0.4-um CMOS 10-Gb/s

4-PAM pre-emphasis serial link transmitter,” IEEE Journal of Solid-State Circuits,

vol. 34, no. 5, pp. 580–585, May 1999.

[28] V. Stojanovic, G. Ginis, and M. Horowitz, “Transmit pre-emphasis for high-speed

time-division-multiplexed serial-link transceiver,” in IEEE International Conference

on Communications, vol. 3, August 2002, pp. 1934–1939.

[29] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yam-

aguchi, H. Takauchi, H. Ishida, K. Gotoh, and H. Tamura, “A 5-6.4-Gb/s 12-channel

transceiver with pre-emphasis and equalization,” IEEE Journal of Solid-State Circuits,

vol. 40, no. 4, pp. 978–985, April 2005.

[30] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee,

H. Ainspan, B. Parker, and M. Beakes, “A 6.4-Gb/s CMOS SerDes core with feed-

forward and decision-feedback equalization,” IEEE Journal of Solid-State Circuits,

vol. 40, no. 12, pp. 2633–2645, December 2005.

[31] J. Zerbe, C. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. Stonecypher,

A. Ho, T. Thrush, R. Kollipara, M. Horowitz, and K. Donnelly, “Equalization and

clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell,” IEEE

Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130, December 2003.

References 91

[32] S. Shekhar, J. Walling, and D. Allstot, “Bandwidth extension techniques for CMOS

amplifiers,” IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2424–2439,

November 2006.

[33] A. Momtaz and M. Green, “An 80mW 40Gb/s 7-tap T/2-spaced FFE in 65nm CMOS,”

in IEEE International Solid-State Circuits Conference Technical Digest, vol. 52,

February 2009, pp. 364–365.

[34] J. Jaussi, G. Balamurugan, D. Johnson, B. Casper, A. Martin, J. Kennedy,

N. Shanbhag, and R. Mooney, “8-Gb/s source-synchronous I/O link with adaptive

receiver equalization, offset cancellation, and clock de-skew,” IEEE Journal of Solid-

State Circuits, vol. 40, no. 1, pp. 80–88, January 2005.

[35] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits. IEEE

Press, 1996.

[36] N. Nedovic, N. Tzartzanis, H. Tamura, F. Rotella, M. Wiklund, Y. Mizutani,

Y. Okaniwa, T. Kuroda, J. Ogawa, and W. Walker, “A 4044 Gb/s 3x oversampling

CMOS CDR/1:16 DEMUX,” IEEE Journal of Solid-State Circuits, vol. 42, no. 12,

pp. 2726–2735, December 2007.

[37] J. Kim and D.-K. Jeong, “Multi-gigabit-rate clock and data recovery based on blind

oversampling,” IEEE Communications Magazine, vol. 41, no. 12, pp. 68–74, Decem-

ber 2003.

[38] K. Mueller and M. Muller, “Timing recovery in digital synchronous data receivers,”

IEEE Transactions on Communications, vol. 24, no. 5, pp. 516–531, May 1976.

[39] F. Gardner, “Interpolation in digital modems – Part I: Fundamentals,” IEEE Transac-

tions on Communications, vol. 41, no. 3, pp. 501–507, March 1993.

[40] M. Spurbeck and R. Behrens, “Interpolated timing recovery for hard disk drive read

channels,” in IEEE International Conference on Communications, vol. 3, June 1997,

pp. 1618–1624.

[41] M. van Ierssel, “Circuit techniques for high-speed chip-to-chip signaling,” Ph.D. dis-

sertation, University of Toronto, 2006.

References 92

[42] G. D. Vishakhadatta, R. Croman, M. Goldenberg, J. Hein, P. Katikaneni, D. Kuai,

C. Lee, I. C. Tesu, R. Trujillo, L. Zhang, K. Anderson, R. Behrens, W. Bliss, L. Du,

T. Dudley, G. Feyh, W. Foland, M. Kastner, Q. Li, J. Mitchem, D. Reed, S. She,

M. Spurbeck, L. Sundell, H. Tran, M. Wei, and C. Zook, “An EPR4 read/write channel

with digital timing recovery,” IEEE Journal of Solid-State Circuits, vol. 33, no. 11, pp.

1851–1857, November 1998.

[43] W. Zhang and R. Spencer, “Timing recovery for backplane ethernet,” IEEE Transac-

tions on Circuits and Systems I, vol. 54, no. 8, pp. 1711–1723, August 2007.

[44] M. Pozzoni, S. Erba, D. Sanzogni, M. Ganzerli, P. Viola, D. Baldi, M. Repossi,

G. Spelgatti, and F. Svelto, “A 12Gb/s 39dB loss-recovery unclocked-DFE receiver

with bi-dimensional equalization,” in IEEE International Solid-State Circuits Confer-

ence Technical Digest, vol. 53, February 2010, pp. 164–165.

[45] V. Balan, J. Caroselli, J.-G. Chern, C. Chow, R. Dadi, C. Desai, L. Fang, D. Hsu,

P. Joshi, H. Kimura, C. Liu, T.-W. Pan, R. Park, C. You, Y. Zeng, E. Zhang, and

F. Zhong, “A 4.8-6.4-Gb/s serial link for backplane applications using decision feed-

back equalization,” IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp. 1957–

1967, September 2005.

[46] M. van Ierssel, A. Sheikholeslami, H. Tamura, and W. Walker, “A 3.2 Gb/s CDR using

semi-blind oversampling to achieve high jitter tolerance,” IEEE Journal of Solid-State

Circuits, vol. 42, no. 10, pp. 2224–2234, October 2007.

[47] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, H. Tamura, and M. Kibune, “A

5Gb/s speculative DFE for 2x blind ADC-based receivers in 65-nm CMOS,” in IEEE

Symposium on VLSI Circuits Digest of Technical Papers, June 2010, pp. 69–70.

[48] M. van Ierssel, H. Yamaguchi, A. Sheikholeslami, H. Tamura, and W. Walker, “Event-

driven modeling of CDR jitter induced by power-supply noise, finite decision-circuit

bandwidth, and channel ISI,” IEEE Transactions on Circuits and Systems I, vol. 55,

no. 5, pp. 1306–1315, June 2008.

h -s adc-b r...i am thankful to the members of my ph.d. oral examination committee: prof.david...

Documents