christian menolfi, thomas toifl, peter buchmann, marcel...

Post on 21-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

446 • 2007 IEEE International Solid-State Circuits Conference

ISSCC 2007 / SESSION 24 / MULTI-GB/s TRANSCEIVERS / 24.6

24.6 A 16Gb/s Source-Series Terminated Transmitter in 65nm CMOS SOI

Christian Menolfi, Thomas Toifl, Peter Buchmann, Marcel Kossel,Thomas Morf, Jonas Weiss, Martin Schmatz

IBM, Rueschlikon, Switzerland

The quest for high data rates at low power consumption and areahas renewed interest in the source-series terminated (SST) driv-er. While SST drivers may not necessarily boast better perform-ance than their counterparts using CML output stages, theiradvantage lies in their potential for lower power operation [1]and their ability to cope with a large range of termination volt-ages, which makes them a prime candidate for multi-standardTX. Given the increasing challenge in achieving acceptable ana-log performance in advanced digital CMOS technologies, the SSTdriver principle is based entirely on digital switching devices thatare optimized for high-speed operation and continue to scale withtechnology. In this paper, the architecture and design of key com-ponents of a half-rate SST TX is presented that implements a ver-satile, power- and area-efficient equalization and impedance-tun-ing scheme. Furthermore, it achieves low jitter and negligibleduty-cycle distortion (DCD), thanks to a clock duty-cycle cleanupcircuit.

Figure 24.6.1 shows a block diagram of the implemented differen-tial half-rate SST TX. All local clocks are derived from a globalhalf-rate CML clock (ck2cml) and are converted to CMOS half-rate (ck2) and quarter rate (ck4) clock, respectively. The 4b quar-ter-rate input data d[0:3] is transformed in a first multiplexerstage to a half-rate interleaved even and odd data stream. Botheven and odd data streams then pass a 4b shift register, that con-sists of 4 latches driven on opposing clock phases of the half-rateclock (ck2) and implements the delayed data taps of a 4-tap FIRpre-emphasis filter (tap[0:3][even, odd]). In order to set the signof any pre-emphasis filter tap, each shift register latch output isfollowed by an XOR gate that selectively (sign[0:3]) inverts thecorresponding filter tap. The resulting 4 even/odd (=8) half-ratetap data streams, along with the half-rate CMOS clock ck2, arethen globally distributed to 44 identical differential half-rate SSTdriver slices, each of which can be configured to select oneeven/odd tap stream out of the available 4×2 tap data streams.Consequently, each SST driver slice can be assigned to any of the4 FIR taps, which adds versatile power- and area-efficient equal-ization capabilities to the TX.

Figure 24.6.2 depicts a schematic of a half-rate differential SSTslice. Each slice is composed of 2 single-ended SST drivers to forma pseudo-differential output stage. One single-ended slice con-tains 2 pull-up/pull-down branches with the corresponding evenand odd data and select transistors, along with a common pull-up/pull-down polysilicon linearization resistor. Even and odddata for both single-ended SST slices are derived from two 4:1input multiplexers to select the data stream corresponding to theassigned FIR position. In order to guarantee stable data and theappropriate timing at the multiplexing output SST stage re-tim-ing latches are introduced. Note that the output impedance of thedriver is equal to the parallel combination of all the SST slices,and it does not matter whether a particular slice is pulling up ordown. Termination-impedance tuning is obtained by some addi-tional logic to disable a certain number of SST slices and to setthem into high impedance (enable=0). A nominal 50Ω impedanceis achieved when 30 SST slices are enabled, leaving a tuningrange of ±14 slices to comfortably cope with process tolerances. AMOS to poly resistance ratio of 1:3 is chosen in this implementa-tion for optimum accuracy/area trade-off.

A particular concern in a half-rate SST TX lies in the fact that itoperates at both clock cycles of the half rate CMOS clock ck2 andany imbalance or DCD has a direct impact on the jitter perform-

ance. Special measures are taken in the CML-to-CMOS clock con-verter, shown in Fig. 24.6.3(a), to cleanup DCD. The first CMLbuffer stage uses DC suppression and acts as a first DCD-cleanupstage [2]. In addition, it provides some gain to maximize the CMLoutput signal swing (out, outb). An AC-coupled inverter withresistive feedback then follows the first CML buffer and acts as aCML to CMOS conversion stage [3]. AC-coupling is a simple andefficient way to remove any DC component and to perform a volt-age level shift to the input of the inverter, which is DC biased toits trip point by means of the feedback resistor. Three taperedCMOS inverters then follow to provide enough drive strength toglobally distribute the half-rate CMOS clock ck2/ck2b to the 44unit SST slices. Care is taken to minimize delay (~ 43ps) in theCMOS clock path for minimum power-supply-noise-induced jit-ter. Furthermore, special attention is paid in the circuit layout tokeep the fully differential clock nets ck2/ck2b symmetrical.

A circuit layout of the implemented SST TX is shown in Fig.24.6.3(b). The macro occupies an area of 230×56µm2 includingESD protection. In order to characterize the performance of theSST TX, a wafer probable test chip is implemented (Fig. 24.6.4).The test chip consists of 2 SST TX lanes that share a commonexternal differential half-rate CML clock input. A serial 3-wireinterface allows control of the TX settings and the 2 bit-patterngenerators that generate independently programmable quarter-rate data for both TX lanes. Furthermore, supply decouplingcapacitors in the order of 50pF/lane are added closely to each TXlane. A chip micrograph is shown in Fig. 24.6.7.

Figure 24.6.5 shows 3 measured PRBS15 differential data eyes at16Gb/s data rate and at 3 different termination voltages Vtt,along with the measured jitter numbers. Sub-ps RJ is measured,which is essentially at the resolution limit of the measurementequipment, while the measured DJ is ~8.5ps and dominated byISI. Split wafer lot measurements of DCD at data rates between5.2 to 12.5Gb/s remain below 600fs. A DC jitter supply sensitivi-ty of -4.6ps/100mV (@Vdd=1V) could be observed. Only a slightdegradation in DJ of <1ps could be observed when switching theneighboring lane in operation. No perceivable difference in thejitter performance nor in the eye opening could be observed forany termination voltage Vtt from 0 to 1V, which proves the versa-tility of the SST TX to cope with different termination standards. The quality of the SST TX clocking path is not only confirmed bysub-ps RJ, but also with the DCD-cleanup performance. Figure24.6.6 shows the measured output peak-to-peak DCD versus anincoming-clock DCD at a data rate of 5Gb/s (2.5GHz CML clock)and different termination voltages Vtt. The measured outputDCD at all termination voltages remains below 0.5%pp at aninput DCD of ±10%. The measured supply current of one SST TXlane including bit-pattern generator at 16Gb/s and differential100Ω termination is 57.5mA at a nominal Vdd of 1V, correspondingto a power-dissipation efficiency of 3.6mW/Gb/s.

Acknowledgements:The authors would like to thank Bhavna Agrawal, Michael Beakes, NickPerez, Steve Walker, and Carl Wermer for analog design enablement andSOI modeling support, and the IBM foundry team for chip manufacturing.

References:[1] H. Hatamkhani, K. J. Wong, R. Drost, et al.,”A 10mW 3.6Gbps I/OTransmitter,” Symp. VLSI Circuits, pp. 97 - 98, Jun., 2003.[2] C. Menolfi, T. Toifl, R. Reutemann, et. al, “A 25Gb/s PAM4 Transmitterin 90nm CMOS SOI,” ISSCC Dig. Tech. Papers, pp. 72-73, Feb., 2005.[3] J. Savoj, B. Razavi, “A CMOS Interface Circuit for Detection of 1.2Gb/sRZ Data,” ISSCC Dig. Tech. Papers, pp. 278 - 279, Feb., 1999.

1-4244-0852-0/07/$25.00 ©2007 IEEE.

447DIGEST OF TECHNICAL PAPERS •

Continued on Page 614

ISSCC 2007 / February 14, 2007 / 11:15 AM

Figure 24.6.1: Half-rate SST TX block diagram. Figure 24.6.2: SST driver half-rate slice architecture.

Figure 24.6.3: (a) CML to CMOS converter with DCD cleanup, (b) Layout of one SST TXcore.

Figure 24.6.5: Measured PRBS15 data eye at 16Gb/s and at different terminationvoltages Vtt.

Figure 24.6.6:Measured output DCD versus input-clock DCD at 5Gb/s and different termination voltages.

Figure 24.6.4: Test-chip block diagram.

FIRshift

register(4 taps)

configuration register

2:1

d0

d2

ck4

out

d0d2d1d3

MUX 4:2

FIRshift

register(4 taps)

CML-to-CMOSClock Generator

tapsign

44 diff. half-rateunit SSTdrivers

ck2cml

ck2/ck2bck4/ck4b

tap 1(even/odd)

tap 2(even/odd)tap 3(even/odd)

tap 0(even/odd)even

odd

sign[0:3]enable[0:28]

config[0,1][0:43]

out+out-

2

2

2

2

globaldistribution

data

incl

ock

in

data

out

2

Vdd

ck2b

Vdd

ck2

ck2

ck2b

dep dop

den don

de

do

endop

den

don

depD Q

E

4:1de_0do_1de_2do_3

4:1do_0de_1do_2de_3

config:tap0,tap1,tap2,tap3

even data mux

odd data mux

re-timing latch

ck2b

Vdd

ck2

ck2

ck2b

dep dop

den don

de

do

endop

den

don

depde

deb

do

dob

ck2

D Q

Eck2b

out+out-

single-ended half-rate SST slice (2x)

enable:0 = off (high impedance)1 = on

2

Vdd

M1C1 / C1B

M1Bck2cml ck2cmlb

OutOutb ck2b

ck2

4x FO2.5 stages ~43ps

ESD+

ESD-

22differentialSST slices

22differentialSST slices

CMOSClock

generator

MUX4:2FIR

56µm

230µm

a)

b)

TX lane14-tap FIR

Half-rate SST driver44 unit slices

Pat. Gen 1PRBS[7|15|31]

8b prog. pattern

d0:d3@ck4

ck4 (CMOS)

2ck2_cml

TX lane04-tap FIR

Half-rate SST driver44 unit slices

Pat. Gen 0PRBS[7|15|31]

8b prog. pattern

d0:d3@ck4

ck4 (CMOS)

2out_lane0

2out_lane1

2ck2cml

3-wire serial interface+ configuration register

3data, clk, load

Testchip

Vtt=0V Vtt=0.5V

Vtt=1.0V

8.35

8.72

8.53

DJ(d-d)[ps]

0.250

0.170

0.120

DCD[ps]

TJ [ps](BER1e-12)

RJ[ps] rms

16Gb/s,PRBS15

10.790.176Vtt=1.0V

11.160.176Vtt=0.5V

10.990.177Vtt=0V

400ps

50%

-10%

+10%

20% p-p

Applied input clock signal@ 2.5GHz

Duty-Cycle Distortion (DCD) @ 5Gb/s vs. input DCD

0

0.1

0.2

0.3

0.4

0.5

0.6

-11

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

DCD in %

DC

D o

ut %

pk-

pk

Vtt=0.0 Vtt=0.5 Vtt=1.0

24

614 • 2007 IEEE International Solid-State Circuits Conference 1-4244-0852-0/07/$25.00 ©2007 IEEE.

ISSCC 2007 PAPER CONTINUATIONS

Figure 24.6.7: Test-chip micrograph.

decoupling caps

decoupling caps

decoupling caps

SST TX lane 0

SST TX lane 1

data patterngenerators

3-wireinterface &config reg.

ck2cml

ck2cmlb

out+ out-

out+ out-

lane 0

lane 1

1mm x 1mmclockinput

top related