isscc 2012 / session 2 / high bandwidth dram &...

3
50 2012 IEEE International Solid-State Circuits Conference ISSCC 2012 / SESSION 2 / HIGH BANDWIDTH DRAM & PRAM / 2.7 2.7 An 8Gb/s/pin 4pJ/b/pin Single-T-Line Dual (Base+RF) Band Simultaneous Bidirectional Mobile Memory I/O Interface with Inter-Channel Interference Suppression Yanghyo Kim 1 , Gyung-Su Byun 2 , Adrian Tang 1 , Chewn-Pu Jou 3 , Hsieh-Hung Hsieh 3 , Glenn Reinman 1 , Jason Cong 1 , Mau-Chung Frank Chang 1 1 University of California, Los Angeles, Los Angeles, CA 2 West Virginia University, Morgantown, WV 3 TSMC, Hsinchu, Taiwan The demand for higher power efficiency and bandwidth is increasing as mobile devices keep enhancing its graphic computing and media processing capabili- ties. Current memory interfaces with single-wire signaling operate at 5Gb/s/pin [1] and 6Gb/s/pin [2] with the power efficiency of 17.4pJ/b/pin and 15.8pJ/b/pin, respectively. Mobile DDR memory I/O with differential signaling has better power efficiency of 6.4pJ/b/pin [3], and so does the prior dual-band interconnect (DBI) [4] with the efficiency of 5pJ/b/pin at 4.2Gb/s/pin for simultaneous bidirectional (SBD) mobile memory I/O interface. However, DBI’s differential signaling is incompatible with existing standards, and it also occupies large die area for using differential transmission lines and an LC-oscillator for generating RF-car- rier. To alleviate these concerns, we propose to use a Single-Transmission-Line DBI (STL-DBI) with the best figure-of-merit (FoM) defined as data rate per pin divided by the I/O-interface die area and power consumption. Major issues of previous DBI [4] are addressed in this STL-DBI design. First, an inter-channel interference (ICI) suppression scheme is introduced to compen- sate the lack of common mode rejection capability on STL. Second, an LC-oscil- lator is replaced with a ring oscillator to reduce die area by 15%. Third, chip-to- board assembly is done by flip-chip bonding to remove the large wire-bonding inductance for better signal integrity. Consequently, the proposed STL-DBI achieves 8Gb/s/pin of data rate (×1.9 of [4]) with the energy efficiency of 4pJ/bit/pin (×0.8 of [4]). Furthermore, the STL-DBI enables SBD read/write oper- ations on a single wire for the memory controller to provide memory-access requests on the command channel with the standard DRAM-compatible timing, as explained in Fig. 2.7.1. As a result, the DRAM access time can be greatly reduced by programming concurrent command requests and data read/write operations in mobile point-to-point memory applications. Figure 2.7.2 shows the STL-DBI transceiver schematic of the memory controller side with an RF-band transmitter (RFTX) and a baseband receiver (BBRX). The RFTX contains a 3-stage ring oscillator, an amplitude-shift keying (ASK) modu- lator, and an on-chip transformer with on-die termination (ODT). First, each ring element consists of differential input NMOS pair and PMOS cross-coupled load, and the frequency is adjusted by extra PMOS with its varying resistance. The generated 18GHz carrier signal is directly fed into M1 and M2 of the ASK mod- ulator, and the data stream D1(RF) modulates the carrier by switching M3 and M4. The modulated signal is then inductively coupled to an off-chip STL through a transformer. To minimize ICI from RFTX to BBRX, one terminal of the second- ary coil is terminated with ODT. As described by the equivalent network, differ- ential D1 signals at P1 and P2 are coupled to ODT and STL. If ODT is matched to the impedance of STL (50Ω), coupled signals are balanced such that the same magnitude but opposite polarity currents will cancel interference from RFTX to BBRX at P3 (BBRX input). The same ODT is shared for BBRX termination to overcome the impedance mismatch of STL. Under this condition, the incoming data stream D2(BB) is extracted at the center tap of the transformer, and the BBRX amplifies the signal using sense amplifier and buffers. As a result, we con- currently transmit/receive D1(RF) and D2(BB) data streams at the controller side. Figure 2.7.3 depicts the STL-DBI transceiver of the DRAM side with an RF-band receiver (RFRX) and a baseband transmitter (BBTX). Just as in controller side, an on-chip transformer with ODT is used to receive RF band data and transmit BB data. First, the input impedance of RFRX is matched to the impedance of STL around 18GHz using the transformer resonance effect (-28dB of S11). The incoming single-ended D1(RF) signal is now converted to differential signal via a transformer. The differential RF-band signal is then injected to the receiver dif- ferential mutual-mixer and down-converted to the baseband data D1. On the BBTX side, the output driver sends the data stream D2(BB) to the off-chip STL via the center tap (P3) of the transformer. If ODT is matched to the impedance of STL, the signal power of D2 is equally distributed to the primary coil and absorbed by STL and ODT. In this process D2 signals are inductively coupled to the RFRX under high-pass filter effect, which causes ICI. However, this residue would act as a common variation for the differential signal at mutual mixer input (P1,2) to minimize ICI. Assuming most of D2 signal power is confined within 10GHz based on its target data rate, this network can reject more than 30dB of ICI from the frequency response. As a result, we achieve simultaneous commu- nications of D1(RF) and D2(BB) at the DRAM side as well. Figure 2.7.4 demonstrates carrier frequency measurement through an off-chip STL. The differential input is tied to VDD and VSS to send out continuous carri- er signal, while its frequency is swept by control voltage. The combination of SMA connector, RF cable, and spectrum analyzer input provides 50Ω load so that the spectrum of carrier signal can be measured in the middle of STL. The measured frequency agrees with simulation results from 14GHz to 26GHz with 0.4V of control voltage range. In order to study the bandwidth density of RF- band and its operating range of carrier frequency, a dual-band data link setup (BB fixed at 5Gb/s) is used to measure BER with varying data rate and control voltage, respectively. From the measurement results, the 18GHz carrier can carry random data up to 3Gb/s with BER <10 -12 , and D1(RF) channel can maintain BER <10 -12 with the control voltage range of 50mV when its data rate is 3Gb/s. This proves that the STL-DBI system can tolerate process variation as long as the free-running ring oscillator covers enough frequency range. Figure 2.7.5 illustrates the measured eye diagrams: (a) when BB or RF-band data channel runs separately, and (b) when BB and RF-band data channels run simul- taneously over the same 5cm STL on FR4 board. In case (a), BB and RF-band data operates up to 6.5Gb/s and 4Gb/s, respectively. Whereas in case (b), the simultaneous BB+RF-band communication exhibits a combined data rate of 8Gb/s (5Gb/s BB + 3Gb/s RF-band). This experiment suggests that better ICI rejection and power supply isolation between BB/RF-band can further increase the data rate higher than 8Gb/s for STL-DBI based SBD read/write operations. In summary, we have designed and characterized the STL-DBI for mobile DRAM I/O interface in 65nm CMOS to obtain an aggregate data throughput of 8Gb/s/pin with power consumption of 32mW (BB:14.4mW, RF:17.6mW). Although the power is measured higher than previous DBI [4] because of higher supply volt- age 1.2V to support the oscillation frequency of ring oscillator, the energy effi- ciency is improved to 4pJ/bit/pin. Due to the existence of ICI on STL, the BER is expected to be higher than [4], and it is measured as <1×10 -12 with 2 15 -1 PRBS pattern. Figure 2.7.6 compares the STL-DBI performance with prior memory I/O interfaces. The FoM of STL-DBI is 2.08(Gb/pin/mm 2 /mJ), which is superior (by a factor of 1.6~19) than that of prior arts. Figure 2.7.7 shows die photo (0.12mm 2 of active area) and the test board with flip-chip die assembly. Acknowledgment: Authors would like to thank US/NSF for contract grant (CCF-0926127). References: [1] Kwang-Il Oh, et al., “A 5-Gb/s/pin Transceiver for DDR Memory Interface With a Crosstalk Suppression Scheme,” IEEE J. Solid-State Circuits, vol. 44, pp. 2222-2232, Aug. 2009. [2] Kyung-Soo Ha, et al., “A 6Gb/s/pin Pseudo-Differential Signaling Using Common-Mode Noise Rejection Techniques Without Reference Signal for DRAM Interfaces,” ISSCC Dig. Tech. Papers, pp.138-139, Feb. 2009. [3] Brian Leibowitz, et al., “A 4.3 GB/s Mobile Memory Interface With Power- Efficient Bandwidth Scaling,” IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 889- 898, Apr. 2010. [4] Gyung-su Byun, et al., “An 8.4Gb/s 2.5pJ/b Mobile Memory I/O Interface Using Simultaneous Bidirectional Dual (Base+RF) Band Signaling,” ISSCC Dig. Tech. Papers, pp. 488-490, Feb. 2011. 978-1-4673-0377-4/12/$31.00 ©2012 IEEE

Upload: duongnguyet

Post on 18-Feb-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ISSCC 2012 / SESSION 2 / HIGH BANDWIDTH DRAM & …cadlab.cs.ucla.edu/~cong/papers/isscc12.pdf · DIGEST OF TECHNICAL PAPERS † 51 ISSCC 2012 / February 20, 2012 / 4:30 PM Figure

50 • 2012 IEEE International Solid-State Circuits Conference

ISSCC 2012 / SESSION 2 / HIGH BANDWIDTH DRAM & PRAM / 2.7

2.7 An 8Gb/s/pin 4pJ/b/pin Single-T-Line Dual (Base+RF) Band Simultaneous Bidirectional Mobile Memory I/O Interface with Inter-Channel Interference Suppression

Yanghyo Kim1, Gyung-Su Byun2, Adrian Tang1, Chewn-Pu Jou3, Hsieh-Hung Hsieh3, Glenn Reinman1, Jason Cong1, Mau-Chung Frank Chang1

1University of California, Los Angeles, Los Angeles, CA2West Virginia University, Morgantown, WV3TSMC, Hsinchu, Taiwan

The demand for higher power efficiency and bandwidth is increasing as mobiledevices keep enhancing its graphic computing and media processing capabili-ties. Current memory interfaces with single-wire signaling operate at 5Gb/s/pin[1] and 6Gb/s/pin [2] with the power efficiency of 17.4pJ/b/pin and 15.8pJ/b/pin,respectively. Mobile DDR memory I/O with differential signaling has better powerefficiency of 6.4pJ/b/pin [3], and so does the prior dual-band interconnect (DBI)[4] with the efficiency of 5pJ/b/pin at 4.2Gb/s/pin for simultaneous bidirectional(SBD) mobile memory I/O interface. However, DBI’s differential signaling isincompatible with existing standards, and it also occupies large die area forusing differential transmission lines and an LC-oscillator for generating RF-car-rier. To alleviate these concerns, we propose to use a Single-Transmission-LineDBI (STL-DBI) with the best figure-of-merit (FoM) defined as data rate per pindivided by the I/O-interface die area and power consumption.

Major issues of previous DBI [4] are addressed in this STL-DBI design. First, aninter-channel interference (ICI) suppression scheme is introduced to compen-sate the lack of common mode rejection capability on STL. Second, an LC-oscil-lator is replaced with a ring oscillator to reduce die area by 15%. Third, chip-to-board assembly is done by flip-chip bonding to remove the large wire-bondinginductance for better signal integrity. Consequently, the proposed STL-DBIachieves 8Gb/s/pin of data rate (×1.9 of [4]) with the energy efficiency of4pJ/bit/pin (×0.8 of [4]). Furthermore, the STL-DBI enables SBD read/write oper-ations on a single wire for the memory controller to provide memory-accessrequests on the command channel with the standard DRAM-compatible timing,as explained in Fig. 2.7.1. As a result, the DRAM access time can be greatlyreduced by programming concurrent command requests and data read/writeoperations in mobile point-to-point memory applications.

Figure 2.7.2 shows the STL-DBI transceiver schematic of the memory controllerside with an RF-band transmitter (RFTX) and a baseband receiver (BBRX). TheRFTX contains a 3-stage ring oscillator, an amplitude-shift keying (ASK) modu-lator, and an on-chip transformer with on-die termination (ODT). First, each ringelement consists of differential input NMOS pair and PMOS cross-coupled load,and the frequency is adjusted by extra PMOS with its varying resistance. Thegenerated 18GHz carrier signal is directly fed into M1 and M2 of the ASK mod-ulator, and the data stream D1(RF) modulates the carrier by switching M3 andM4. The modulated signal is then inductively coupled to an off-chip STL througha transformer. To minimize ICI from RFTX to BBRX, one terminal of the second-ary coil is terminated with ODT. As described by the equivalent network, differ-ential D1 signals at P1 and P2 are coupled to ODT and STL. If ODT is matchedto the impedance of STL (50Ω), coupled signals are balanced such that the samemagnitude but opposite polarity currents will cancel interference from RFTX toBBRX at P3 (BBRX input). The same ODT is shared for BBRX termination toovercome the impedance mismatch of STL. Under this condition, the incomingdata stream D2(BB) is extracted at the center tap of the transformer, and theBBRX amplifies the signal using sense amplifier and buffers. As a result, we con-currently transmit/receive D1(RF) and D2(BB) data streams at the controllerside.

Figure 2.7.3 depicts the STL-DBI transceiver of the DRAM side with an RF-bandreceiver (RFRX) and a baseband transmitter (BBTX). Just as in controller side,an on-chip transformer with ODT is used to receive RF band data and transmitBB data. First, the input impedance of RFRX is matched to the impedance of STLaround 18GHz using the transformer resonance effect (-28dB of S11). The

incoming single-ended D1(RF) signal is now converted to differential signal viaa transformer. The differential RF-band signal is then injected to the receiver dif-ferential mutual-mixer and down-converted to the baseband data D1. On theBBTX side, the output driver sends the data stream D2(BB) to the off-chip STLvia the center tap (P3) of the transformer. If ODT is matched to the impedanceof STL, the signal power of D2 is equally distributed to the primary coil andabsorbed by STL and ODT. In this process D2 signals are inductively coupled tothe RFRX under high-pass filter effect, which causes ICI. However, this residuewould act as a common variation for the differential signal at mutual mixer input(P1,2) to minimize ICI. Assuming most of D2 signal power is confined within10GHz based on its target data rate, this network can reject more than 30dB ofICI from the frequency response. As a result, we achieve simultaneous commu-nications of D1(RF) and D2(BB) at the DRAM side as well.

Figure 2.7.4 demonstrates carrier frequency measurement through an off-chipSTL. The differential input is tied to VDD and VSS to send out continuous carri-er signal, while its frequency is swept by control voltage. The combination ofSMA connector, RF cable, and spectrum analyzer input provides 50Ω load sothat the spectrum of carrier signal can be measured in the middle of STL. Themeasured frequency agrees with simulation results from 14GHz to 26GHz with0.4V of control voltage range. In order to study the bandwidth density of RF-band and its operating range of carrier frequency, a dual-band data link setup(BB fixed at 5Gb/s) is used to measure BER with varying data rate and controlvoltage, respectively. From the measurement results, the 18GHz carrier can carryrandom data up to 3Gb/s with BER <10-12, and D1(RF) channel can maintain BER<10-12 with the control voltage range of 50mV when its data rate is 3Gb/s. Thisproves that the STL-DBI system can tolerate process variation as long as thefree-running ring oscillator covers enough frequency range.

Figure 2.7.5 illustrates the measured eye diagrams: (a) when BB or RF-band datachannel runs separately, and (b) when BB and RF-band data channels run simul-taneously over the same 5cm STL on FR4 board. In case (a), BB and RF-banddata operates up to 6.5Gb/s and 4Gb/s, respectively. Whereas in case (b), thesimultaneous BB+RF-band communication exhibits a combined data rate of8Gb/s (5Gb/s BB + 3Gb/s RF-band). This experiment suggests that better ICIrejection and power supply isolation between BB/RF-band can further increasethe data rate higher than 8Gb/s for STL-DBI based SBD read/write operations.

In summary, we have designed and characterized the STL-DBI for mobile DRAMI/O interface in 65nm CMOS to obtain an aggregate data throughput of 8Gb/s/pinwith power consumption of 32mW (BB:14.4mW, RF:17.6mW). Although thepower is measured higher than previous DBI [4] because of higher supply volt-age 1.2V to support the oscillation frequency of ring oscillator, the energy effi-ciency is improved to 4pJ/bit/pin. Due to the existence of ICI on STL, the BER isexpected to be higher than [4], and it is measured as <1×10-12 with 215-1 PRBSpattern. Figure 2.7.6 compares the STL-DBI performance with prior memory I/Ointerfaces. The FoM of STL-DBI is 2.08(Gb/pin/mm2/mJ), which is superior (bya factor of 1.6~19) than that of prior arts. Figure 2.7.7 shows die photo(0.12mm2 of active area) and the test board with flip-chip die assembly.

Acknowledgment:Authors would like to thank US/NSF for contract grant (CCF-0926127).

References:[1] Kwang-Il Oh, et al., “A 5-Gb/s/pin Transceiver for DDR Memory InterfaceWith a Crosstalk Suppression Scheme,” IEEE J. Solid-State Circuits, vol. 44, pp.2222-2232, Aug. 2009.[2] Kyung-Soo Ha, et al., “A 6Gb/s/pin Pseudo-Differential Signaling UsingCommon-Mode Noise Rejection Techniques Without Reference Signal for DRAMInterfaces,” ISSCC Dig. Tech. Papers, pp.138-139, Feb. 2009.[3] Brian Leibowitz, et al., “A 4.3 GB/s Mobile Memory Interface With Power-Efficient Bandwidth Scaling,” IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 889-898, Apr. 2010.[4] Gyung-su Byun, et al., “An 8.4Gb/s 2.5pJ/b Mobile Memory I/O InterfaceUsing Simultaneous Bidirectional Dual (Base+RF) Band Signaling,” ISSCC Dig.Tech. Papers, pp. 488-490, Feb. 2011.

978-1-4673-0377-4/12/$31.00 ©2012 IEEE

Page 2: ISSCC 2012 / SESSION 2 / HIGH BANDWIDTH DRAM & …cadlab.cs.ucla.edu/~cong/papers/isscc12.pdf · DIGEST OF TECHNICAL PAPERS † 51 ISSCC 2012 / February 20, 2012 / 4:30 PM Figure

51DIGEST OF TECHNICAL PAPERS •

ISSCC 2012 / February 20, 2012 / 4:30 PM

Figure 2.7.1: Comparison between conventional bidirectional and STL-DBI-based simultaneous bidirectional read/write operations (top), and STL-DBI-based mobile memory I/O interface with flip-chip Interconnect Assembly (bottom).

Figure 2.7.2: STL-DBI transceiver schematic of the memory controller sidewith measured BB output signal, equivalent ICI suppression network, and frequency response of ICI rejection from RFTX to BBRX.

Figure 2.7.3: STL-DBI transceiver schematic of the memory side with measured RF-band output signal, RFRX return loss (S11), equivalent ICI sup-pression network, and frequency response of ICI rejection from BBTX to RFRX.

Figure 2.7.5: Measured (a) eye diagrams for conventional read/write (6.5Gb/sBB, 4Gb/s RF-band, respectively) (b) eye diagrams for SBD read/write with8Gb/s aggregate data rate (5Gb/s BB + 3Gb/s RF-band). Figure 2.7.6: STL-DBI performance summary and comparison with prior arts.

Figure 2.7.4: Ring oscillator test diagram, measured vs. simulated oscillationfrequency, BER vs. data rate measurement, and BER vs. control voltage measurement.

2

Page 3: ISSCC 2012 / SESSION 2 / HIGH BANDWIDTH DRAM & …cadlab.cs.ucla.edu/~cong/papers/isscc12.pdf · DIGEST OF TECHNICAL PAPERS † 51 ISSCC 2012 / February 20, 2012 / 4:30 PM Figure

• 2012 IEEE International Solid-State Circuits Conference 978-1-4673-0377-4/12/$31.00 ©2012 IEEE

ISSCC 2012 PAPER CONTINUATIONS

Figure 2.7.7: Die Photo of STL-DBI transceiver controller side and memory side, andtest board with flip-chip interconnect implementation.