ieee journal of solid-state circuits, vol. 45, no. 4, april...

12
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010 909 A 21-Gb/s 87-mW Transceiver With FFE/DFE/Analog Equalizer in 65-nm CMOS Technology Huaide Wang and Jri Lee, Member, IEEE Abstract—A 21-Gb/s backplane transceiver has been presented. The transmitter incorporates half-rate topology with purely digital blocks to substantially reduce power consumption. The receiver employs analog and decision-feedback equalizers in a full-rate structure to avoid complicated structure. The one-tap decision-feedback equalizer merges the summer and the slicer into the flipflop, shortening the feedback path and speeding up the operation considerably. Fabricated in 65-nm CMOS, the transceiver (excluding clock generating PLL and CDR circuits) delivers 21-Gb/s data (2 1 PRBS) over 40-cm FR4 channel while consuming 87 mW from a 1.2-V supply. Index Terms—Analog equalizer, decision-feedback equalizer (DFE), feedforward equalizer (FFE), transceiver (TRx), TSPC latch. I. INTRODUCTION T HE rapidly growing volume of wireline communications has increased the data rate almost exponentially over the past decades. This trend can be summarized as shown in Fig. 1. Since the 1990s, the channel capacity of both electrical and optical media has been improved approximately 1.5 times per year. Such an exponential progress continued until recently. To achieve higher data rate in backplane communications, de- signers tend to improve the I/O circuits rather than modifying the board, simply because of the cost and the compatibility is- sues. In this case, the I/O usually requires larger power, which may soon dominate the overall power consumption if the con- ventional architectures and circuit structures are used. Fortu- nately, novel design can relax the required power dissipation significantly. Fig. 2(a) reveals the power efficiency of state-of- the-art I/O transceivers as a function of technology node. The power per Gbps has been reduced from 350 mW to about 20 mW in 10 years, which is a result of optimized circuit designs and ad- vanced technologies. Note that scaling itself can only improve bandwidth and power efficiency to some extent, given that the same output swing over 50 printed circuit board (PCB) traces needs to be maintained. In this prototype, we target a power ef- ficiency of less than 5 mW/Gbps. Manuscript received August 24, 2009; revised November 13, 2009. Current version published March 24, 2010. This paper was approved by Guest Editor Masayuki Mizuno. The authors are with the Electrical Engineering Department, National Taiwan University, Taipei 10617, Taiwan (e-mail: [email protected]). Digital Object Identifier 10.1109/JSSC.2010.2040117 Fig. 1. Evolution of wireline communication standards. Generally speaking, for a given medium, the longest achiev- able distance is inversely proportional to the data rate. Indeed, if the channel is modeled as an network, the bandwidth (or equivalently, maximum data rate) is inversely proportional to loss (i.e., distance). Fig. 2(b) summarizes the maximum error-free distances of Gb/s backplane transceivers recently published. Among them, [1] and [11] use full-rate architec- ture, and [12] adopts half-rate structure. An analog equalizer [13], decision-feedback equalizer (DFE) [14], [16], or transi- tion equalizer [15] are incorporated to overcome the channel loss. Alternative data formats such as PAM-4 [4], [6] trade signal-to-noise ratio (SNR) for bandwidth. Nonetheless, the distance–data rate tradeoff basically follows a hyperbolic curve. The common way to deviate from the curve is to incorporate more equalization along the data path, which in turn dissipates more power. To reduce the power consumption, we need to come up with new circuit topologies. Exploiting higher data rate for future application, we look for 20-Gb/s transceiver solutions which can be used to deliver data over inexpensive backplances. The goal is to achieve a data rate of greater than 20 Gb/s over 40-cm FR4 channels [18]. Note that in advanced CMOS technologies, the supply voltage is so limited that many conventional circuits become obsolete, e.g., a 1.2-V supply can stack no more than two stages in a current-mode-logic (CML) circuit. Some circuit advantages no longer exist in a flat structure. At 20 Gb/s, the FR4 environment also becomes more severe. Fig. 3(a) illustrates the measured response of a typical 40-cm FR4 channel, presenting 21.2-dB and 30.5-dB loss at 10 GHz 0018-9200/$26.00 © 2010 IEEE Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Upload: others

Post on 28-Feb-2021

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010 909

A 21-Gb/s 87-mW TransceiverWith FFE/DFE/Analog Equalizer

in 65-nm CMOS TechnologyHuaide Wang and Jri Lee, Member, IEEE

Abstract—A 21-Gb/s backplane transceiver has been presented.The transmitter incorporates half-rate topology with purelydigital blocks to substantially reduce power consumption. Thereceiver employs analog and decision-feedback equalizers in afull-rate structure to avoid complicated structure. The one-tapdecision-feedback equalizer merges the summer and the slicerinto the flipflop, shortening the feedback path and speeding upthe operation considerably. Fabricated in 65-nm CMOS, thetransceiver (excluding clock generating PLL and CDR circuits)delivers 21-Gb/s data (2�� 1 PRBS) over 40-cm FR4 channelwhile consuming 87 mW from a 1.2-V supply.

Index Terms—Analog equalizer, decision-feedback equalizer(DFE), feedforward equalizer (FFE), transceiver (TRx), TSPClatch.

I. INTRODUCTION

T HE rapidly growing volume of wireline communicationshas increased the data rate almost exponentially over the

past decades. This trend can be summarized as shown in Fig. 1.Since the 1990s, the channel capacity of both electrical andoptical media has been improved approximately 1.5 times peryear. Such an exponential progress continued until recently.To achieve higher data rate in backplane communications, de-signers tend to improve the I/O circuits rather than modifyingthe board, simply because of the cost and the compatibility is-sues. In this case, the I/O usually requires larger power, whichmay soon dominate the overall power consumption if the con-ventional architectures and circuit structures are used. Fortu-nately, novel design can relax the required power dissipationsignificantly. Fig. 2(a) reveals the power efficiency of state-of-the-art I/O transceivers as a function of technology node. Thepower per Gbps has been reduced from 350 mW to about 20 mWin 10 years, which is a result of optimized circuit designs and ad-vanced technologies. Note that scaling itself can only improvebandwidth and power efficiency to some extent, given that thesame output swing over 50 printed circuit board (PCB) tracesneeds to be maintained. In this prototype, we target a power ef-ficiency of less than 5 mW/Gbps.

Manuscript received August 24, 2009; revised November 13, 2009. Currentversion published March 24, 2010. This paper was approved by Guest EditorMasayuki Mizuno.

The authors are with the Electrical Engineering Department, National TaiwanUniversity, Taipei 10617, Taiwan (e-mail: [email protected]).

Digital Object Identifier 10.1109/JSSC.2010.2040117

Fig. 1. Evolution of wireline communication standards.

Generally speaking, for a given medium, the longest achiev-able distance is inversely proportional to the data rate. Indeed,if the channel is modeled as an network, the bandwidth(or equivalently, maximum data rate) is inversely proportionalto loss (i.e., distance). Fig. 2(b) summarizes the maximumerror-free distances of Gb/s backplane transceivers recentlypublished. Among them, [1] and [11] use full-rate architec-ture, and [12] adopts half-rate structure. An analog equalizer[13], decision-feedback equalizer (DFE) [14], [16], or transi-tion equalizer [15] are incorporated to overcome the channelloss. Alternative data formats such as PAM-4 [4], [6] tradesignal-to-noise ratio (SNR) for bandwidth. Nonetheless, thedistance–data rate tradeoff basically follows a hyperbolic curve.The common way to deviate from the curve is to incorporatemore equalization along the data path, which in turn dissipatesmore power. To reduce the power consumption, we need tocome up with new circuit topologies. Exploiting higher datarate for future application, we look for 20-Gb/s transceiversolutions which can be used to deliver data over inexpensivebackplances. The goal is to achieve a data rate of greater than20 Gb/s over 40-cm FR4 channels [18]. Note that in advancedCMOS technologies, the supply voltage is so limited that manyconventional circuits become obsolete, e.g., a 1.2-V supplycan stack no more than two stages in a current-mode-logic(CML) circuit. Some circuit advantages no longer exist in a flatstructure.

At 20 Gb/s, the FR4 environment also becomes more severe.Fig. 3(a) illustrates the measured response of a typical 40-cmFR4 channel, presenting 21.2-dB and 30.5-dB loss at 10 GHz

0018-9200/$26.00 © 2010 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 2: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

910 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Fig. 2. Performance comparison of modern I/Os: (a) power efficiency versus technology; (b) error-free distance versus data rate.

Fig. 3. 40-cm FR4 channel behavior: (a) channel loss; (b) far-end output dataeye.

and 20 GHz, respectively. With such high loss, the far-end dataeye at 20 Gb/s is fully closed as expected [Fig. 3(b)]. Here,we incorporate three equalization techniques, namely, feedfor-ward, analog, and decision-feedback equalizers in the trans-mitter and the receiver, respectively, achieving a total compen-sation of 26 dB at Nyquist frequency (10 GHz). To save powerand area, we employ parallel scheme in the transmitter, mini-mizing the number of CML stages and eliminating inductors.Meanwhile, full-rate architecture is preserved in the receiver toreduce the circuit complexity and power consumption. A noveldecision-feedback equalizer design is also introduced. It obvi-ates the conventional combiner and realizes a faster feedback toincrease the operation speed. Note that in this paper, we focuson the improvement of bandwidth and power efficiency, andother issues such as crosstalk are not addressed. Fabricated in65-nm CMOS technology, the transceiver achieves an error-free(BER ) data link over a 40-cm FR4 channel while dis-sipating a total power of 87 mW from a 1.2-V supply.

This paper is organized as follows. Sections II and III respec-tively present the transmitter and receiver architectures as wellas the building blocks, where the design tradeoffs and consid-erations are discussed in detail. Section IV summarizes the ex-perimental results.

II. TRANSMITTER

A. Number of FFE Taps

Like many other mainstream designs, we use feedforwardequalizer (FFE) in the transmitter. However, selecting thenumber of taps for a very high-speed ( 20 Gb/s) FFE involves

many concerns. We conduct a general analysis for this issue inthis subsection.

Let us first consider a general FFE with taps as shown inFig. 4(a). In real circuit implementation such as a CML summer,we usually require a constant tail current in the output driver soas to keep a fixed common-mode level. That means the sum ofall coefficients (the absolute values) is a constant, which canbe normalized to unity. We thus denote the first tap as

, while the second to the th taps as, respectively. Since the output is a linear combination

of all the (delayed) data inputs, we arrive at the transfer functionfrom to :

(1)

To gain more insight, we convert the above discrete analysis tocontinuous domain. That is,

(2)

where denotes the bit period.Equation (2) implies important properties for the maximum

boost an FFE can provide. Now, if we keep the total amount ofall coefficients other than the first one as (i.e.,

), the maximum boost at Nyquist frequency becomes

(3)

The equation holds when and. In other words, if the ratio between the first tap and the

other taps is a constant, the maximum boosting is also a con-stant, regardless of the number of taps. Note that must locatebetween 0 and 1/2 in order to perform high-frequency boosting.For , the response actually presents an attenuation

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 3: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

WANG AND LEE: A 21-Gb/s 87-mW TRANSCEIVER WITH FFE/DFE/ANALOG EQUALIZER IN 65-nm CMOS TECHNOLOGY 911

Fig. 4. (a) FFE model. (b) Frequency response of FFE with different tap number.

rather than a boost for high frequencies. Depending on the dcloss that a system can tolerate, we can select an optimal . Forexample, if a minimum amplitude as large as 1/3 of the orig-inal full swing is acceptable for the receiver, we have 1/3and the FFE creates 9.5 dB boost at . Using more tapswould help reshape the response better to fit the inverse of thechannel loss, but give no additional compensation. Actually asexpected, the more taps we use, the better the response fits intothe desired response. With , we plot the response of dif-ferent FFEs having 2, 3, and 4 taps [Fig. 4(b)]. Here, the dc gainis normalized to unity for a fair comparison. The FFE with 3 ormore taps reveals sufficient fitting quality to the compensationcurve, whereas the one with 2 taps provides only limited accu-racy. Note that the desired response here is obtained by trans-forming the pulse response of a 20-cm FR4 channel into fre-quency domain. In this prototype, we set the maximum boostingto about 9 dB.

Another issue is the parasitic capacitance caused by the taps.Indeed, for a CML combiner, adding more taps implies an al-most linear increase of parasitic capacitance at the output node.If we denote and as the capacitance caused by main tapand pre/post taps, we estimate the total parasitic capacitance ofan -tap combiner as

(4)

Since bandwidth is inversely proportional to , the maximumdata rate would roll off as the number of taps increases. On theother hand, for large-signal operation, the output eye openingis more important. Even with identical boosting at Nyquist fre-quency, an FFE with less taps suffers from larger ISI, since it hasfewer “tools” to fix the distorted waveform. Fig. 5 illustrates thebandwidth and ISI effects for a typical FFE designed in 65-nmCMOS technology. Here, we set the total tail currents of all tapsto 12 mA (which corresponds to a maximum swing of 300 mV).With transistor level simulation, we realize that to keep suffi-cient bandwidth, the tap number must be less than or equalto 4. Meanwhile, the ISI test requires at least 3 taps so as tomaintain an eye opening larger than 75%. Thus, in this design,we choose as a compromise between bandwidth andaccuracy.

Fig. 5. Bandwidth and eye opening of FFE with different tap number.

B. Half-Rate Topology

At a speed beyond 20 Gb/s, the full-rate structure inevitablydissipates significant power, because every single block init has to be made in CML. Half-rate architecture, however,can leverage against the stringent speed requirement and saveconsiderable power. It is primarily because in 65-nm CMOS,the half-rate data (10 Gb/s) and clock (10 GHz) can be handledpurely in the digital domain, which, even with design margin,still consumes less power as compared with its CML coun-terpart. To verify this, we observe the behavior of a two-stageinverter chain designed in 65-nm CMOS, where a minimumlength inverter with proper sizing is driving a 4 largergate , as shown in Fig. 6. To stay in the high-gain region,

is self-biased with a 100-k resistor and ac-coupling isused between stages. As the clock frequency goes up, thetotal power dissipated in increases linearly until 8–10 GHz[Fig. 6(a)]. The dynamic power begins to dominate beyond thispoint. The small-signal analysis also shows that an inverter withfan-out-of-4 loading has a 0-dB gain at 9 GHz, even though the3-dB bandwidth is about 3 GHz [Fig. 6(b)]. The large-signalsimulation as shown in Fig. 6(c) reveals a similar result. Ap-plying a full-swing sinusoidal (i.e., magnitude ),we observe that maintains a rail-to-rail output swingup to 8 GHz, and rolls off afterwards. Speaking overall, wedesign the transmitter in a way that pushes the digital circuitsto the limit to demonstrate the feasibility of high-speed digitalbuffers.

By the same token, 10-Gb/s data can also be handled digi-tally to save power. For example, a true single-phase clock logic

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 4: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

912 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Fig. 6. Behavior of inverter chain in 65-nm CMOS: (a) power efficiency; (b) small-signal response; (c) large-signal response.

Fig. 7. Power efficiency of TSPC and CML latches.

(TSPC) latch consumes much less power than its CML counter-part [19]. A comparison of power consumption between TSPCand CML latches is shown in Fig. 7. Here, both latches are de-signed in 65-nm CMOS. If operated at 10 Gb/s, the TSPC dis-sipates only 1/10 of power.

In comparing the power of full-rate and half-rate structures,we implement both transmitters in 65-nm CMOS and partitionthe power. As shown in Fig. 8, the half-rate design saves 12 mW(50%) of power. Note that here we presume two 10-Gb/s dataare ready as the inputs in the half-rate structure. In some appli-cations, we may need to serialize a few low-speed data beforefinal transmission, potentially making half-rate structures moreadvantageous. Nonetheless, the actual amount of power savingmay be subject to variation for different conditions.

The half-rate architecture inevitably suffers from pulse-widthvariation if the clock presents duty cycle distortion. Fig. 9 showsthe 20-Gb/s output of the transmitter under such an effect. With

5% clock distortion, the output data is expected to have twotraces, increasing the peak-to-peak jitter by 6 ps. Properly ad-justing the bias/load on the clock buffer and/or applying duty-cycle correction circuit can suppress this effect.

C. Transmitter Architecture

The complete transmitter is shown in Fig. 10, where half-ratedata process with 3-tap FFE is employed. A demultiplexer(DMUX) deserializes the input into two 10-Gb/s data streams,which are subsequently fed into two parallel latch chains. Thetwo demultiplexed and delayed data are then deployed forthe MUXes to pick up alternatively, producing appropriate bitsequence to be multiplied with the corresponding coefficients

. The output driver thus combines the three anddelivers the pre-emphasized output. The 20-GHz clock needsto be divided by a factor of 2 before driving the DMUX andFFE. Following the design in [20], a single-to-differential (S/D)converter is employed in front to adopt the limited testingequipment. The 10-GHz clock is buffered by delays and

(both are made of CMOS inverters) to provide properphase shifts for the DMUX, the latches, and the MUXes. Notethat we may not need the DMUX if the half-rate data sequencesare ready. As will be demonstrated in Section IV, the fixeddelays and are sufficient for stable operation overtemperature and supply variations. A table summarizing thepower dissipation of each block is also demonstrated. Wedescribe the building block design in the following subsection.

D. Building Blocks

1) TSPC Latch: The TSPC latch and its layout are depictedin Fig. 11. With the device sizes listed below, this latch achievesdata rate of 13 Gb/s while driving a fan-out-of-2 loading in65-nm CMOS. The TSPC latch occupies 60 m , which is only8% as large as a CML latch with the same bandwidth.

2) CML/CMOS Converter: A proper interface between theCML and CMOS logic now becomes mandatory. As illustratedin Fig. 12(a), the converter contains a current steering adapterwhich translates the CML input into full-swing levels. Note thatminimum channel length is used for all devices, which gener-ates a conversion gain greater than 10 dB at 10 GHz with only0.5 mA.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 5: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

WANG AND LEE: A 21-Gb/s 87-mW TRANSCEIVER WITH FFE/DFE/ANALOG EQUALIZER IN 65-nm CMOS TECHNOLOGY 913

Fig. 8. Comparison of full-rate and half-rate transmitter.

Fig. 9. Effect of clock distortion for half-rate transmitter.

Fig. 10. Transmitter architecture.

3) MUX: The MUX design is shown in Fig. 12(b). Withthe help of rail-to-rail data and clock, it is possible to realizea pseudo-NMOS MUX at 20 Gb/s. Here, the sign-bit selectionof the two data streams is accomplished by two-way switchesmade of transmission gates. Note that the MUX in Fig. 12(b)naturally restores the output signal back to CML levels.

4) Output Combiner: The output combiner (driver) followsconventional designs [14], [15] (Fig. 13). CML pairs with tun-able tail currents are combined by means of the 55 loadingresistors. The three tail current sources have a constant total cur-rent of 8 mA, leading to a maximum swing (when no boosting)of 200 mV. Note that the devices in different taps are slightlyscaled with current to further reduce the output capacitance.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 6: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

914 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Fig. 11. Design and layout of TSPC latch.

Fig. 12. (a) CML-to-CMOS converter. (b) MUX.

Fig. 13. Output combiner.

III. RECEIVER

The receiver design involves tradeoffs as well. Possible chal-lenges are 1) depth of DFE, 2) adjustment of analog equalizerand DFE, and 3) integration of DFE and CDR. We address theseissues in this section.

A. DFE Analysis

One of the most critical blocks in the receiver is the DFE. Asimple model of it can be found in Fig. 14(a), where delayedoutputs are fed back to the input with corresponding coefficients

. The input is applied to the summer di-rectly. Similar to the case of FFE, the maximum achievable

boosting is determined by the coefficient amount rather than thenumber of taps. If we fix the total amount of all coefficients as

(i.e., ), we obtain the maximum Nyquistboosting as

(5)

The equation holds when and. In other words, if we fix the total amount of the feed-

back coefficients, the maximum boost at Nyquist frequency isalso fixed regardless of the tap number . Again, using moretaps only improve the equalization quality but not the amount ofboosting. In Fig. 14(b), we plot the DFE responses with differenttaps and have them compared with the desired response. Thedesired response is also obtained from a 20-cm FR4 channel,and the effect of pre-cursors is removed since DFE can onlyhandle the post-cursors. A DFE with three or more taps pro-vides better fitting. For high-speed operation, however, we mayuse fewer taps due to the excessive parasitic capacitance and cir-cuit complexity.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 7: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

WANG AND LEE: A 21-Gb/s 87-mW TRANSCEIVER WITH FFE/DFE/ANALOG EQUALIZER IN 65-nm CMOS TECHNOLOGY 915

Fig. 14. (a) DFE model. (b) DFE response with different tap number. (c) Response of one-tap DFE with and without slicer. (d) Waveforms.

Fig. 15. Comparison of full-rate and half-rate DFE.

The above analysis only reflects part of the behavior of aDFE. In reality, the summer output needs to pass through aslicer in order to regenerate a complete bit before it propagatesinto the delay chain. Taking the saturation effect into consid-eration, we realize that a DFE with a slicer actually generateslarger compensation at high frequencies. Fig. 14(c) reveals thesimulated response of a 20-Gb/s, 1-tap DFE with and withoutslicer in 65-nm CMOS technology. Using slicer improves thedc gain and Nyquist boosting by 3 and 5 dB, respectively, whichtranslates to 15% less ISI. Fig. 14(d) illustrates the time-do-main waveforms with the same setup. The slicer increases eyeopening by 200 mV.

To determine DFE topology, we examine the tradeoffs offull-rate and half-rate structures (Fig. 15). Half-rate DFEs slowdown the data path operation, but also create other issues. The

-to- delay of the flipflops (delay cells) must be well de-fined such that the sampling still falls around the center of theinput data eye. Adding up two data streams with different phases

Fig. 16. Bandwidth and eye opening of DFE with different tap number.

also evokes jitter if the timing is not properly managed. Thecircuit complexity also requires more power and area. Note thatthe flipflops here still need to sample the full-rate data eventhough it is clocked at half-rate. In other words, the flipflopshere require sampling bandwidth of 20 Gb/s. As indicated in

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 8: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

916 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Fig. 17. Receiver architecture.

Fig. 18. 20-Gb/s decision-feedback equalizer.

Fig. 15, full-rate DFEs, on the other hand, consume only 50%hardware. It potentially saves power and area if one can speedup the feedback by proper circuit design.1 Again, we plot thebandwidth and ISI for a conventional full-rate DFE designed in65-nm CMOS as shown in Fig. 16. One tap is indeed an optimalchoice for 5 6 dB compensation.

In this prototype, we adopt a full-rate DFE with 1-tap to pro-vide 6-dB boosting. Special circuit techniques are employed torelax the stringent speed requirement in the feedback loop. Weleave the circuit details to Section III-C.

B. Receiver Architecture

Fig. 17 shows the receiver architecture. A one-stage tran-simpedance amplifier (TIA) with shunt-shunt feedback convertsthe signal current into voltage while providing 50 termination.An analog equalizer follows the TIA in the front-end to createboosting in part, and a 1-tap DFE is employed subsequently.The 1-tap DFE response is also plotted here, where the Nyquistboosting is given by . At 20 Gb/s, the analog anddecision-feedback equalizers can achieve maximum boosting of12 and 6 dB, respectively. Note that the tuning directions forboth equalizers here are correlated. The whole receiver is com-posed of CML circuits.

C. DFE Design

Conventional full-rate DFEs fail to operate at very high speed,since they suffer from inadequate settling time for the feedback

1If a DMUX must be included, the power dissipation for both case will becomparable.

signal. To accelerate the feedback, we merge the adder and theslicer into the flipflop as shown in Fig. 18. Now, the outputdirectly feeds back to the input with a coefficient , which isimplemented in current mode. The pair thus – carriesthe feedback signal. It is equivalent to dynamically adjust thethreshold level of the sampler based on the previous result. Thatis, if the previous bit is “0”, the current bit will be considered“1” if the output crosses , and vice versa. Note that thetotal tail current of the adder and the master latch remains con-stant in order to keep a fixed data swing. The current of adderpair – is also steered by – synchronously withthe master latch, resetting the feedback when the comparison (or“slicing”) is accomplished. As a result, the master latch main-tains a constant output swing in locking state, where the regen-eration pair – carries all the tail current . It saves14 mW as compared with half-rate 1-tap DFE at 20 Gb/s. Notethat the shorter feedback path in the DFE not only increasesthe operation speed but provides a larger margin of phase forsampling.

D. Analog Equalizer

The analog equalizer follows the design in [21], where twofilter stages are interposed with a gain buffer. Fig. 19(a) illus-trates the details of the front-end circuits. The equalizing filtersuse capacitive degeneration and inductive peaking techniquesto achieve large boosting and high gain. Fig. 19(b) shows theresponse for the maximum and minimum boosting. The analogequalizer provides a maximum gain of about 12 dB at Nyquistfrequency and 12-dB dc gain.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 9: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

WANG AND LEE: A 21-Gb/s 87-mW TRANSCEIVER WITH FFE/DFE/ANALOG EQUALIZER IN 65-nm CMOS TECHNOLOGY 917

Fig. 19. (a) Analog equalizer. (b) Frequency responses for � � � � and � � ��� �.

Fig. 20. Die photos and testing board (with 40-cm channel).

Although not demonstrated in this prototype, the two equal-izers can be made adaptive to further increase the flexibility.Since the DFE here has only one coefficient to tune, theadaptation of analog and decision feedback equalizers can bemerged. Both analog [21] and digital (e.g., sign-sign LMS [22])approaches are possible. In most cases, a CDR circuit must beincorporated eventually to retime the data.2 Transistor-levelsimulation suggests that the analog equalizer is capable ofcreating coarsely-compensated data with rms jitter less than0.2 UI. This allows the subsequent CDR to lock the phase tothe equalized data signal.

It is interesting to compare our design with other high-speedDFE circuits recently published. The circuit in [23] utilizesspeculative architecture to realize a 19-Gb/s 1-tap DFE in90-nm CMOS, which requires a much larger power and area(38 mW 0.019 mm , our DFE: 20 mW 0.008 mm ). Design in[24] tunes the body voltage in the feedback path, which will bepossible only if a triple n-well is applicable. Even with a triplen-well available, the circuit still put the devices in danger ofturning on the pn-junction. The D-flipflop with inductive loadsis unsuitable for broadband data. It also requires a high power(45 mW) and additional bias voltage for the feedback loop.

IV. EXPERIMENTAL RESULTS

The transceiver has been designed and fabricated in 65-nmCMOS technology. Fig. 20 illustrates the die photos as well as a

2This prototype does not include CDR design for simplicity.

Fig. 21. Output matching of transmitter.

testing board (FR4, 40-cm differential trace). Chips are mountedon the board directly as a chip-on-board assembly with all padswire bonded to the traces. Data output pads have two parallelbonding wires whose lengths are less than 1 mm. The parasiticinductance is estimated to be 500 pH. The transmitter and re-ceiver occupy 0.16 0.18 mm and 0.35 0.10 mm , respec-tively. Transceiver performance for different channel lengths(from 5 cm to 40 cm) is thoroughly examined with an input datapattern of 2 1 PRBS. The transmitter and the receiver con-sume 45 and 42 mW, respectively, from a 1.2-V supply.

Fig. 21 shows the output matching of the transmitter. Be-tween dc and 20 GHz, the transmitter’s is less than 17 dB,suggesting little reflection. Fig. 22 depicts the transmitter’soutput at 20 Gb/s with minimum (0 dB) and maximum (9.5 dB)boost. It presents a maximum swing of 200 mV (0-dB boost)and a minimum swing of 65 mV (9.5-dB boost). Fig. 23 showsthe recovered data for different channel lengths in the receiver’soutput at 20 Gb/s. The maximum data jitter measures 1.44ps,rms with an input clock jitter of 0.22 ps,rms. To fully eval-uate the signal integrity, we measure the bit error rate (BER)for different lengths (Fig. 24). With full boost at both ends(transmitter and receiver), the transceiver can deliver 21-Gb/sdata over 40-cm FR4 channel with BER 10 . Fig. 25illustrates the output data jitter of the transmitter with different

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 10: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

918 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Fig. 22. Transmitter’s output with (a) Minimum (0 dB), (b) Maximum (9.5 dB) boost. (data rate: 20 Gb/s, vertical scale: 50 mV/div, horizontal scale: 20 ps/div.).

Fig. 23. Receiver’s output for (a) 10-cm, (b) 40-cm channels. (data rate: 20 Gb/s, vertical scale: 50 mV/div, horizontal scale: 20 ps/div.).

TABLE IPERFORMANCE SUMMARY

supply and temperature variations. The input clock jitter is0.22 ps,rms. Due to the marginal operation of digital buffersand flipflops, the jitter with 1.2-V supply is somewhat higher.Measurement shows that the jitter can be suppressed byraising the supply slightly. Future design in more advancedtechnologies can provide larger bandwidth for higher opera-tion speed. A bathtub BER test for different clock phase error

to the receiver is shown in Fig. 26, suggesting a tolerablerange of 240 . Possible difficulties such as CDR incorpo-ration in future work can be relaxed by such a wide range.Table I and Fig. 27 compare the performance of this workwith other high-speed transceivers recently published. Ourwork outperforms others in many aspects such as powerconsumption and chip area.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 11: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

WANG AND LEE: A 21-Gb/s 87-mW TRANSCEIVER WITH FFE/DFE/ANALOG EQUALIZER IN 65-nm CMOS TECHNOLOGY 919

Fig. 24. BER measurement for different channel lengths.

Fig. 25. Transmitter output jitter as a function of supply and temperature.

Fig. 26. Bathtub plot for receiver clock phase.

Fig. 27. Performance comparison between this work and other high-speedtransceivers.

V. CONCLUSION

A low-power high-speed transceiver prototype for backplaneapplications is proposed. The half-rate Tx utilizes pure digitaldata processing to save power, while the full-rate Rx reduces thecircuit complexity by using novel DFE design. High-speed cir-cuit techniques and design considerations are presented as well.This work provides promising potential for the next generation’svery high-speed I/O design.

REFERENCES

[1] D-L. Chen and M. Baker, “A 1.25 Gb/s, 460 mW CMOS transceiverfor serial data comunication,” in IEEE Int. Solid-State Circuits Conf.(ISSCC) Dig. Tech. Papers, Feb. 1997, pp. 242–243.

[2] C.-K. Yang et al., “A ��� �m CMOS 4 Gb/s transceiver with data re-covery using oversampling,” in Symp. VLSI Circuits Dig. Tech. Papers,Jun. 1997, pp. 71–72.

[3] R. Gu et al., “A 0.5-3.5 Gb/s low-power low-jitter serial data CMOStransceiver,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers, Feb. 1999, pp. 352–353.

[4] R. Farjad-Rad et al., “A 0.3-�m CMOS 8-Gb/s 4-PAM serial link trans-ceiver,” in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 1999, pp. 41–44.

[5] G. Besten, “Embedded low-cost 1.2 Gb/s inter-IC serial data link in�����m CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.Tech. Papers, Feb. 2000, pp. 250–251.

[6] J. Sonntag et al., “An adaptive PAM-4 5 Gb/s backplane transceiverin 0.25-�m CMOS,” in Proc. IEEE Custom Integrated Circuits Conf.,May 2002, pp. 363–366.

[7] N. Krishnapura et al., “A 5 Gb/s NRZ transceiver with adaptive equal-ization for backplane transmission,” in IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Papers, Feb. 2005, pp. 60–61.

[8] J. Jaussi et al., “A 20 Gb/s embedded clock transceiver in 90 nmCMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers, Feb. 2006, pp. 1334–1335.

[9] E. Yeung et al., “Power/performance/channel length tradeoffs in 1.6 to9.6 Gbps I/O links in 90 nm CMOS for server, desktop, and mobileapplications,” in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2006, pp.79–80.

[10] M. Harwood et al., “A 12.5 Gb/s SerDes in 65 nm CMOS using abaud-rate ADC with digital receiver equalization and clock recovery,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2007, pp. 436–437.

[11] T. Masuda et al., “A 250 mW full-rate 10 Gb/s transceiver core in 90nm CMOS using a tri-state binary PD with 100 ps gated digital output,”in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.2007, pp. 438–439.

[12] K. Krishna et al., “A 0.6 to 9.6 Gb/s binary backplane transceiver corein ���� �m CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC)Dig. Tech. Papers, Feb. 2005, pp. 64–65.

[13] S. Gondi et al., “A 10 Gb/s CMOS adaptive equalizer for backplane ap-plications,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers, Feb. 2005, pp. 328–329.

[14] V. Balan et al., “A 4.8-6.4-Gb/s serial link for backplane applicationsusing decision feedback equalization,” IEEE J. Solid-State Circuits,vol. 40, pp. 1957–1967, Sep. 2005.

[15] K.-L. Wong and C.-K. Yang, “A serial-link transceiver with transi-tion equalization,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.Tech. Papers, Feb. 2006, pp. 757–758.

[16] A. Kiaei et al., “A 10 Gb/s equalizer with decision feedback for highspeed serial links,” in Proc. IEEE Custom Integrated Circuits Conf.,May 2007, pp. 285–288.

[17] G. Balamurugan et al., “A scalable 5–15 Gbps, 14–75 mW low powerI/O transceiver in 65 nm CMOS,” in Symp. VLSI Circuits Dig. Tech.Papers, Jun. 2007, pp. 270–271.

[18] H. Wang et al., “A 21-Gb/s 87-mW transceiver with FFE/DFE/Linearequalizer in 65-nm CMOS technology,” in Symp. VLSI Circuits Dig.Tech. Papers, Jun. 2009, pp. 50–51.

[19] J. Lee and H. Wang, “Study of subharmonically injection-lockedPLLs,” IEEE J. Solid-State Circuits, vol. 44, pp. 1539–1553, May2009.

[20] J. Lee et al., “Design and comparison of three 20-Gb/s backplane trans-ceivers for duobinary, PAM4, and NRZ data,” IEEE J. Solid-State Cir-cuits, vol. 43, pp. 2120–2133, Sep. 2008.

[21] J. Lee, “A 20-Gb/s adaptive equalizer in 0.13-�m CMOS technology,”IEEE J. Solid-State Circuits, vol. 41, pp. 2058–2066, Sep. 2006.

[22] C. Wong et al., “A 50 MHz eight-tap adaptive equalizer for partial-response channels,” IEEE J. Solid-State Circuits, vol. 30, pp. 228–234,Mar. 1995.

[23] D. Z. Turker et al., “A 19 Gb/s 38 mW 1-Tap speculative DFE receiverin 90 nm CMOS,” in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2009,pp. 216–217.

[24] C.-L. Hsieh and S.-I. Liu, “A 40 Gb/s decision feedback equalizer usingback-gate feedback technique,” in Symp. VLSI Circuits Dig. Tech. Pa-pers, Jun. 2009, pp. 218–219.

[25] J. F. Bulzacchelli et al., “A 10-Gb/s 5-Tap DFE/4-Tap FFE transceiverin 90 nm CMOS,” IEEE J. Solid-State Circuits, vol. 41, pp. 2885–2900,Dec. 2006.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.

Page 12: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL …cc.ee.ntu.edu.tw/~jrilee/publications/21GTRx_EQU_J.pdf · 2011. 1. 3. · into the flipflop, shortening the feedback

920 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

Huaide Wang was born in Taipei, Taiwan, in 1984.He received the B.S. degree in electrical engineeringfrom National Taiwan University, Taipei, Taiwan, in2006. He is currently purchasing the Ph.D. degree inthe Graduated Institute of Electrical Engineering inNational Taiwan University.

His research interests include phase-locked loops,high-speed SerDes and backplane transceivers.

Jri Lee (S’03–M’04) received the B.Sc. degree inelectrical engineering from National Taiwan Univer-sity (NTU), Taipei, Taiwan, in 1995, and the M.S. andPh.D. degrees in electrical engineering from the Uni-versity of California, Los Angeles (UCLA), both in2003.

After two years of military service (1995–1997),he was with Academia Sinica, Taipei, Taiwan, from1997 to 1998, and subsequently with Intel Corpora-tion from 2000 to 2002. He joined National TaiwanUniversity (NTU) since 2004, where he is currently

Associate Professor of electrical engineering. His current research interests in-clude high-speed wireless and wireline transceivers, phase-locked loops, anddata converters.

Dr. Lee is now serving in the Technical Program Committees of the Interna-tional Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits,and Asian Solid-State Circuits Conference (A-SSCC). He received the BeatriceWinner Award for Editorial Excellence at the 2007 ISSCC, the Takuo SuganoAward for Outstanding Far-East Paper at the 2008 ISSCC, the Best TechnicalPaper Award from the Y. Z. Hsu Memorial Foundation in 2008, the T. Y. WuMemorial Award from the National Science Council (NSC), Taiwan in 2008,the Young Scientist Research Award from Academia Sinica in 2009, and theOutstanding Young Electrical Engineer Award in 2009. He has also receivedthe NTU outstanding teaching award in 2007, 2008, and 2009. He has servedas a guest editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS in 2008 anda tutorial lecturer at the 2009 ISSCC.

Authorized licensed use limited to: National Taiwan University. Downloaded on April 13,2010 at 07:05:03 UTC from IEEE Xplore. Restrictions apply.