1600 ieee journal of solid-state circuits, vol. 34, no. 11,...

5
1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama Abstract—We implemented 72-Mb direct Rambus DRAM with new memory architecture suitable for multibank. There are two novel schemes: flexible mapping redundancy (FMR) technique and additional refresh scheme. This paper shows that multibank reduces redundancy area efficiency. But with the FMR technique, this 16-bank DRAM realizes the same area efficiency as a single-bank DRAM. In other words, FMR reduces chip area by 13%. This paper also describes that additional refresh scheme re- duces data-retention power to 1/4. Its area efficiency is about four times better than that of the conventional redundancy approach. Index Terms— Additional refresh, area reduction, DRAM, di- rect rambus DRAM, flexible mapping redundancy, redundancy, refresh, retention, yield.. I. INTRODUCTION D IRECT Rambus DRAM has a high-speed interface of 1.6-Gbyte/s data rate. It also has multibank architecture for effective high bandwidth. From the viewpoint of memory design, multibank has a great effect on DRAM architecture. For area-effective implementation, we introduce new DRAM design techniques. This paper describes DRAM architecture suitable for a multibank system. II. CHIP OVERVIEW A. Dependent Bank As demands for high bandwidth increases, recent DRAM’s tend to have multiple banks and perform interleave operation. This is an excellent approach for realizing effective high performance by hiding row access latency. On the other hand, there is also strong demand for low-cost DRAM’s. One of the solutions for this demand is a “shared sense amp,” architecture where bitline (BL) sense amps are shared with adjacent subarrays. This is also a good way to reduce chip area by eliminating up to half the sense amps. But shared sense amps and multibank do not fit well. Usually, each bank needs its own sense amp that independently changes its state between precharge and active. This means that extra sense amps are needed on the border of adjacent banks. As the bank count increases, the more sense amps are needed, thereby worsening area efficiency. To address this problem, direct Rambus DRAM utilizes dependent bank architecture where Manuscript received March 19, 1999; revised June 10, 1999. The authors are with the Microelectronics Engineering Laboratory, Toshiba Corp., Yokohama 247-8585 Japan (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(99)08353-5. sense amps are shared with adjacent banks. In this system, access to the bank next to the active bank is prohibited. For example, when bank 3 is active, access to the adjacent banks 2 or 4 is prohibited. In spite of this constraint, the dependent bank system shows multibank performance. Thus, it realizes good area efficiency. B. Chip Architecture A block diagram of the 72-Mb DRAM is shown in Fig. 1. Banks are arranged as 16 horizontal stripes. Sense amps (not shown) are shared with adjacent banks. A row decoder (X- DEC) samples and latches the predecoded row address run vertically through all X-DEC’s. That allows all related circuits and buses to be shared by all banks. Die area overhead associated with the 16 dependent interleave operation is about 3% of the chip. Column selection lines (CSL’s) and main DQ lines (MainDQ’s) run vertically through all the banks. In every column cycle (10 ns), two column bank lines (C bank) adjacent to an accessed bank are activated. Four bits of data come out from each 128-kb segment of a bank. As a result, 144 bits of parallel data (4 bits from each 128-kb segment) move to interface logic, where 8 : 1 parallel-serial conversion is executed, and are then transferred through 18 I/O pads. C. Measurement Waveforms Fig. 2(a)–(c) shows data output waveforms, eye diagram, and shmoo plots. In the output waveforms, every 1.25 ns, data come out from a pin. The eye diagram shows over 1 ns window width, which is enough for the specification of 0.7 ns. The shmoo plot shows a certain margin at full- bandwidth operation with memory read/write transactions. The specification is 2.5-V and 1.25-ns period. These show that a successful 1.6-GB/s (800 Mb/s/pin) data rate with bank interleave operation is achieved. III. REDUNDANCY ARCHITECTURE A. Introduction—Redundancy Area Unit Generally, as DRAM density increases, the defect density also increases. Recently, redundancy has become indispens- able for DRAM. Its area already accounts for about 5% of the chip with 0.25- m technology [1]. This is a significant percentage because it directly impacts the cost of DRAM’s. So effective implementation of redundancy is a very impor- tant issue in DRAM design. One good approach to improve redundancy area efficiency is to have a large redundancy area unit (RAU). The RAU is a memory area in which one spare 0018–9200/99$10.00 1999 IEEE

Upload: others

Post on 16-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, …pages.cs.wisc.edu/~david/courses/cs755/cs755/reader/dram... · 2001. 4. 13. · 1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS,

1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999

A 1.6-GByte/s DRAM with FlexibleMapping Redundancy Techniqueand Additional Refresh Scheme

Satoru Takase and Natsuki Kushiyama

Abstract—We implemented 72-Mb direct Rambus DRAM withnew memory architecture suitable for multibank. There are twonovel schemes: flexible mapping redundancy (FMR) techniqueand additional refresh scheme.

This paper shows that multibank reduces redundancy areaefficiency. But with the FMR technique, this 16-bank DRAMrealizes the same area efficiency as a single-bank DRAM. In otherwords, FMR reduces chip area by 13%.

This paper also describes that additional refresh scheme re-duces data-retention power to 1/4. Its area efficiency is about fourtimes better than that of the conventional redundancy approach.

Index Terms—Additional refresh, area reduction, DRAM, di-rect rambus DRAM, flexible mapping redundancy, redundancy,refresh, retention, yield..

I. INTRODUCTION

DIRECT Rambus DRAM has a high-speed interface of1.6-Gbyte/s data rate. It also has multibank architecture

for effective high bandwidth. From the viewpoint of memorydesign, multibank has a great effect on DRAM architecture.For area-effective implementation, we introduce new DRAMdesign techniques. This paper describes DRAM architecturesuitable for a multibank system.

II. CHIP OVERVIEW

A. Dependent Bank

As demands for high bandwidth increases, recent DRAM’stend to have multiple banks and perform interleave operation.This is an excellent approach for realizing effective highperformance by hiding row access latency. On the otherhand, there is also strong demand for low-cost DRAM’s. Oneof the solutions for this demand is a “shared sense amp,”architecture where bitline (BL) sense amps are shared withadjacent subarrays. This is also a good way to reduce chiparea by eliminating up to half the sense amps. But sharedsense amps and multibank do not fit well. Usually, each bankneeds its own sense amp that independently changes its statebetween precharge and active. This means that extra senseamps are needed on the border of adjacent banks. As the bankcount increases, the more sense amps are needed, therebyworsening area efficiency. To address this problem, directRambus DRAM utilizes dependent bank architecture where

Manuscript received March 19, 1999; revised June 10, 1999.The authors are with the Microelectronics Engineering Laboratory, Toshiba

Corp., Yokohama 247-8585 Japan (e-mail: [email protected]).Publisher Item Identifier S 0018-9200(99)08353-5.

sense amps are shared with adjacent banks. In this system,access to the bank next to the active bank is prohibited. Forexample, when bank 3 is active, access to the adjacent banks2 or 4 is prohibited. In spite of this constraint, the dependentbank system shows multibank performance. Thus, it realizesgood area efficiency.

B. Chip Architecture

A block diagram of the 72-Mb DRAM is shown in Fig. 1.Banks are arranged as 16 horizontal stripes. Sense amps (notshown) are shared with adjacent banks. A row decoder (X-DEC) samples and latches the predecoded row address runvertically through all X-DEC’s. That allows all related circuitsand buses to be shared by all banks. Die area overheadassociated with the 16 dependent interleave operation is about3% of the chip. Column selection lines (CSL’s) and mainDQ lines (MainDQ’s) run vertically through all the banks. Inevery column cycle (10 ns), two column bank lines (C bank)adjacent to an accessed bank are activated. Four bits of datacome out from each 128-kb segment of a bank. As a result,144 bits of parallel data (4 bits from each 128-kb segment)move to interface logic, where 8 : 1 parallel-serial conversionis executed, and are then transferred through 18 I/O pads.

C. Measurement Waveforms

Fig. 2(a)–(c) shows data output waveforms, eye diagram,and shmoo plots. In the output waveforms, every 1.25 ns,data come out from a pin. The eye diagram shows over1 ns window width, which is enough for the specificationof 0.7 ns. The shmoo plot shows a certain margin at full-bandwidth operation with memory read/write transactions. Thespecification is 2.5-V and 1.25-ns period. These show thata successful 1.6-GB/s (800 Mb/s/pin) data rate with bankinterleave operation is achieved.

III. REDUNDANCY ARCHITECTURE

A. Introduction—Redundancy Area Unit

Generally, as DRAM density increases, the defect densityalso increases. Recently, redundancy has become indispens-able for DRAM. Its area already accounts for about 5% ofthe chip with 0.25- m technology [1]. This is a significantpercentage because it directly impacts the cost of DRAM’s.So effective implementation of redundancy is a very impor-tant issue in DRAM design. One good approach to improveredundancy area efficiency is to have a large redundancy areaunit (RAU). The RAU is a memory area in which one spare

0018–9200/99$10.00 1999 IEEE

Page 2: 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, …pages.cs.wisc.edu/~david/courses/cs755/cs755/reader/dram... · 2001. 4. 13. · 1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS,

TAKASE et al.: 1.6-GByte/s DRAM WITH FMR TECHNIQUE 1601

Fig. 1. A block diagram of a 72-Mb DRAM and its column redundancy.

element can replace a defective row or column. Generally,more spare elements are needed for smaller RAU to preparefor biased distribution of defects in a chip. Examples of smallRAU (multibank) and big RAU (single-bank) are shown inFig. 3. A big square shows memory subarray, a cross showsa defect, and a line shows a spare element.

A spare element could not be shared with another banksince plural banks will be active at the same time. To clarifythe difference, suppose there are four defects. In the single-bank case, these defects can be managed, but in the multibankcase, they cannot. So more spares are required in the multibankcase, although not all are used. Thus area efficiency in smallRAU is bad.

On the other hand, recent DRAM’s tend to have smallerRAU. Interleaved bank operation on a DRAM makes row-RAU small, since a spare element cannot be shared with otherbanks. Where high-speed multibit column access is needed,column-RAU tends to be small, since replacing a defect witha spare element at long physical distance will affect accesstime and/or need extra datapath and muxes.

This means recent DRAM’s tend to have small RAU andneed many spare elements to achieve certain yield, but only

some of them will be used. So for the area-effective implemen-tation, we introduced a new technique called “flexible mappingredundancy” (FMR).

B. Flexible Mapping Redundancy

In this technique, to reduce area overhead, the number offuse-sets that store the addresses of defects is less than thatof spare elements, as shown in Fig. 4. These correspondencesare determined flexibly, by mapping fuses in a fuse-set, asopposed to one-to-one correspondence in conventional work[1]. Mapping is determined depending on the information ofdefect distribution. Mapping data are also stored in a fuse-set.

With this technique, the fuse-set count can be reduced asmuch as necessary. At the same time, sufficient spares in achip can be provided. Usually, one fuse-set area is bigger thanone spare element, so this is an effective approach.

C. Row Redundancy System

Fig. 5 explains the row redundancy system within an 18-Mmat. There are four spares in each bank. With 16 banks, thereare 64 spares in an 18-M mat, and there are 27 fuse-sets.One fuse-set stores two kinds of information. One is the

Page 3: 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, …pages.cs.wisc.edu/~david/courses/cs755/cs755/reader/dram... · 2001. 4. 13. · 1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS,

1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999

(a) (b)

(c)

Fig. 2. (a) Data output waveform, (b) an eye diagram, and (c) shmoo plot.

Fig. 3. Comparison of big RAU and small RAU.

Fig. 4. Basic idea of flexible mapping redundancy.

defect address, which includes 12 fuses, the same as theconventional case. The other is mapping information, whichis stored in extra fuses. One spare is selected out of 64, butjust two extra fuses are enough in this case. Because there

Page 4: 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, …pages.cs.wisc.edu/~david/courses/cs755/cs755/reader/dram... · 2001. 4. 13. · 1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS,

TAKASE et al.: 1.6-GByte/s DRAM WITH FMR TECHNIQUE 1603

Fig. 5. Row redundancy system.

Fig. 6. Fuse-set count versus row yield of 18-M mat.

is already accessed bank information in the fuses for defectaddress, the mapping fuse only has to select one spare outof four within a bank. Two control lines from mapping fuses(map0 and map1) are activated during power-on sequencedepending on mapping information. So access time penaltyrelated to mapping is negligible.

All fuse-sets are connected to global lines such as /Smatch,/SRAD0, and /SRAD1, which run horizontally through an18-M mat. In every row cycle, at most one fuse-set is activatedif matched with an access address. Depending on mappinginformation, a matched fuse-set asserts one spare and deassertsnormal wordline (WL) through these global lines, decoders,and latches. Thus, flexible mapping is achieved between thefuse-set and spare element. One advantage of this technique isthat any fuse-set count can be selected because new fuse-setscan be added by connecting these horizontal global lines.

Fig. 6 shows the yield of an 18-M mat determined by fuse-set with target defect density. The Poisson distribution functionis estimated for defect distribution. We a choose 27-fuse-setcount to minimize the area while at the same time keepingthe yield.

D. Column Redundancy System

Fig. 1 explains the column redundancy scheme of theDRAM. An overlapped region between one spare CSL (SCSL)and one bank consists of a spare element (four bitline pairs).One spare element can replace a defective CSL within a128-kb segment (64 CSL’s). There are two spare elementsin a 128-kb segment (288 elements in an 18-M mat). On theother hand, there are 27 fuse-sets in an 18-M mat. A fuse-set,which has 16 fuses, stores an address of a defective column tobe replaced and also indicates the address of a spare elementto be used. In every column access when a column bank and

Fig. 7. Area comparison with conventional case.

Fig. 8. Retention time distribution.

column address matches information stored in a fuse-set, itactivates one spare element in an accessed bank according tomapping information in the fuse-set.

As in the row case, connections between fuse-sets and spareelements are set during power-on sequence. The numbersof fuse-sets and spare elements are determined by the yieldsimulation.

E. Comparison with Conventional Approach

Fig. 7 shows redundancy area efficiency comparison. Thereare three cases: 1) single bank, 2) 16 bank with conventionalredundancy, and 3) 16 bank with flexible mapping redundancy.If we implement 16-bank DRAM with the conventional ap-proach, its redundancy area goes up to keep the yield. Butwith the FMR technique, reduction of the fuse-set count savesan area equal to 13% of the chip. The actual area requiredfor this technique is 5.8 mm , that is, about 5.2% of the chip,including all spare elements for both row and column, fuses,related circuitry, and buses.

Although some previous studies introduce a flexible relationbetween fuse-set and spare element [2], these correspondencesare subject to some restrictions when utilizing multiple spareelements within an SCSL. We also estimate that this techniqueimproves area efficiency by about 35% compared with theprevious flexible scheme.

IV. SELF-REFRESH ARCHITECTURE

A. Introduction—Retention Time Distribution

Data-retention power is one of the important characteristicsof DRAM. The requirement for low power is especiallystrong in the mobile application. Generally, a longer self-refresh period (tSREF) makes self-refresh current smallerbut causes retention failures. Fig. 8 shows the self-refreshperiod and fail-bit rate. One characteristic of the retention

Page 5: 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, …pages.cs.wisc.edu/~david/courses/cs755/cs755/reader/dram... · 2001. 4. 13. · 1602 IEEE JOURNAL OF SOLID-STATE CIRCUITS,

1604 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999

(a)

(b)

Fig. 9. (a) Schematic diagram of additional refresh and (b) timing diagram of additional refresh.

time distribution is that there is tailing distribution. A verysmall number of bits show very short refresh period. If wecan manage these bits well, we can extend the refresh period.So we adopt the “additional refresh“ scheme to realize lowretention power DRAM.

B. Additional Refresh Scheme

In this scheme, the self-refresh period is set eight timeslonger than that of normal refresh operation to reduce thepower. Additional refresh operation is added selectively to therows that have weak retention bit to avoid retention failure.These additional refresh operations need extra current, but theyare usually very small. Thus, low power is realized withoutretention failure.

The addresses of weak retention bits are stored in fuses. Toreduce the number of fuses, four row addresses correspond toone fuse. This DRAM is an 8-k refresh chip; however, 2-kfuses are required for this scheme.

Schematic and timing diagrams of this scheme are shownin Fig. 9(a) and (b), respectively. Once the chip enters intoself-refresh mode, a self-refresh period is divided into 32-kCLK pulses. During the first 8-k CLK’s, that is, 1/4 of tSREF,refresh operations are performed for all 8-k rows, whereas inthe rest of the 24-k CLK’s (3/4 of tSREF), refresh operationoccurs only when a refresh address matches the informationstored in fuses. Every refresh operation for each row istriggered by int.RASb pulse. To reduce retention current,tSREF is set eight times longer than normal refresh period.Poor retention bits whose retention times are less than 1/4of tSREF should be replaced by ordinary redundant row orcolumn. The number of poor bits is usually very small, andso this ordinary redundancy can manage it.

Fig. 10. Measurement results of additional refresh.

Fig. 11. Area comparison.