improving the reliability of mlc nand flash memories through adaptive data refresh and error control...

Improving the Reliability of MLC NAND Flash MemoriesThrough Adaptive Data Refresh and Error Control Coding

Chengen Yang & Hsing-Min Chen & Trevor N. Mudge &

Chaitali Chakrabarti

Received: 11 October 2013 /Revised: 17 January 2014 /Accepted: 24 February 2014# Springer Science+Business Media New York (outside the USA) 2014

Abstract NAND Flash memory has become the most widelyused non-volatile memory technology. We focus on multi-level cell (MLC) NAND Flash memories because they havehigh storage density. Unfortunately MLC NAND Flash mem-ory also has reliability problems due to narrower thresholdvoltage gap between logical states. Errors in these memoriescan be classified into data retention (DR) errors and programinterference (PI) errors. DR errors are dominant if the datastorage time is longer than 1 day and these errors can bereduced by refreshing the data. PI errors are dominant if thedata storage time is less than 1 day and these errors can behandled by error control coding (ECC). In this paper wepropose a combination of data refresh policies and low costECC schemes that are cognizant of application characteristicsto address the errors in MLC NAND Flash memories. First,we use Gray code based encoding to reduce the error rates inthe four subpages (MSB-even, LSB-even, MSB-odd, LSB-odd) of a 2-bit MLC NAND Flash memory. Next, we applydata refresh techniques where the refresh interval is a functionof the program/erase (P/E) frequency of the application. Weshow that an appropriate choice of refresh interval and BCH

based ECC scheme can minimize memory energy while sat-isfying the reliability constraint.

Keywords MLCNANDFlash . Data retention error .

Program interferences error . Data refresh . ECC

1 Introduction

Flash memory has become the dominant technology for non-volatile memories. It is used in memory cards, USB Flashdrives, and solid-state drives in a wide range of applications[1]. We focus on NAND Flash memories since they have highstorage density, low static power consumption, low cost perbit area and good scalability. Specifically, we focus on multi-level cell (MLC) NAND Flash memories which store 2 ormore bits per cell by supporting 4 or more voltage states.These memories have greater storage density compared tosingle-level cell (SLC) NAND Flash memories and are be-coming increasingly popular. Unfortunately, MLC NANDFlash memories are quite error prone. One of the main reasonsis the reduced gap between adjacent threshold levels whichcauses even small shifts in threshold levels to result in errors.

To enhance the reliability of MLC NAND Flash memories,techniques such as wear leveling, bad block management andgarbage collection have been proposed [2–4]. In addition, tohandle random soft errors, error detection/correction codes(ECC), such as Hamming codes [5], and long linear blockcodes such as the Bose-Chaudhuri-Hocquenghem (BCH)codes have been used in [6, 7]. Schemes based on concatena-tion of BCH codes and Trellis CodingModulation (TCM) andLow Density Parity Check (LDPC) have also been proposedin [8, 9]. While most errors in existing Flash memories arerandom, in scaled technologies, the increase in the thresholdvoltage variation can cause multiple bits to be upset (MBU) atthe same time. Byte-level ECC such as Reed Solomon (RS)

This work was supported in part by DARPA-PERFECT and by NSFCSR-0910699.

C. Yang (*) :H.<M. Chen : C. ChakrabartiSchool of Electrical, Computer and Energy Engineering,Arizona State University, Tempe, AZ 85287, USAe-mail: [email protected]

H.<M. Chene-mail: [email protected]

C. Chakrabartie-mail: [email protected]

T. N. MudgeDepartment of Electrical and Computer Engineering,University of Michigan, Ann Arbor, MI 48109, USAe-mail: [email protected]

J Sign Process SystDOI 10.1007/s11265-014-0880-5

code [10, 11] has been proposed to deal with MBUs. In [12],we proposed a product ECC scheme using RS codes alongrows and Hamming codes along columns to achieve very higherror correction capability. Unfortunately the storage overheadof our scheme was large and the error correction capa-bility was an overkill for error patterns that are typicalof these memories.

According to recent work in [13, 14], errors in MLCNAND Flash can be classified into data retention (DR) errorsand programming interference (PI) errors. DR errors arecaused by leakage of the electrons trapped in the floating gateand cause the threshold voltage to reduce. PI errors result fromparasitic capacitance coupling with neighboring cells andcause the threshold voltage to increase. Empirical analysisof error patterns in 3x-nm MLC Flash memory showedthat (i) both DR and PI errors increase with the numberof program/erase (P/E) cycles; (ii) if the data storagetime is longer than 1 day, DR errors are dominant,while if the data storage time is less than 1 day, PIerrors are dominant; (iii) DR errors and PI errors havedata dependency and location dependency.

This work builds upon the DR and PI error analysis in [13,14]. The first observation is that the dominant error (DR or PI)is different for different application scenarios. For instance, PIerrors dominate if the NAND Flash memory is used as thevirtualized memory in lab computers, where P/E frequencycould be very high but the data is not stored beyond a day. Onthe other hand, if the Flash memory is used in an USB driverfor long term storage, DR errors are dominant. In [14], a datarefresh technique based on remapping or in-placereprogramming is proposed to correct DR errors. While therefresh interval is adjusted to handle the increasing DR errorsover the memory lifetime, a very strong ECC scheme is usedfor all applications independent of their access frequencies. Inour earlier work [15], we developed ECC schemes that werecognizant of the data storage time in Flash memories. Weapplied Gray coding and separated the MSB and LSB intotwo subpages so that one type of error transitions (0->1 or1->0) was dominant in each subpage. Next, we proposedproduct code based scheme using BCH/Hamming along rowsand even parity check along columns to detect the mostpossible error locations. Our earlier scheme achieved an un-correctable bit error rate of (UBER) of 10−9 if the data storagetime was less than 1 day. However, since we did not use datarefresh to correct DR errors, if the data storage time was morethan a day, the BER due to DR errors was as high as 10−3, andthe proposed product ECC scheme could only achieveUBER=10−5.

In this paper, we improve upon our earlier work [15] andpropose a technique to achieve UBER=10−15. We utilize thefact that different applications have different program/erase(P/E) frequencies and propose an adaptive refresh techniquethat can achieve the UBER constraint with simple BCH codes

by adapting the refresh interval to the P/E frequency of theapplication. The main contributions are as follows:

1. We apply Gray coding and 2 bit interleaving so that the biterror rates (BERs) inMSB and LSB subpages of even andodd pages are lower and the error rates are comparable.Thus, the MSB and LSB subpages can share the sameECC unit resulting in reduced hardware overhead.

2. We propose a refresh technique where the refresh intervaldepends on the average P/E frequency of an application.For applications with P/E frequencies higher than onceday, we propose remapping based refresh during regulardata updates. This refresh technique has little effect onmemory energy and ECC decoding latency. For applica-tions with P/E frequencies lower than once per day, wepropose in-place reprogramming based refresh, where therefresh interval can be chosen to minimize memory ener-gy or ECC decoding latency.

3. We derive the lowest cost ECC scheme and refresh inter-val for a given reliability constraint such that the memoryenergy is minimized. For instance, if UBER=10−15 at 50KP/E cycles, for an application with P/E frequency of onceper week, the lowest memory energy is achieved if therefresh interval is 3 days and BCH (552, 512, t=4) is usedfor both even and odd pages. While these results are basedon the NAND Flash error model provided in [13, 14], theproposed method of combining data refresh policies withlow cost ECC schemes to address reliability constraint canbe used for other NAND Flash memories as well.

The rest of the paper is organized as follows. Section 2gives the background on NAND Flash memories. Error char-acteristics are described in Section 3. Section 4 presents can-didate ECC schemes along with the proposed application-dependent refresh technique. Effect of different refresh inter-vals onmemory energy and ECC decoding latency is analyzedin Section 5. Section 6 concludes the paper.

2 NAND Flash Background

NAND Flash memories are built with floating gate cells; thethreshold voltage of these cells can be programmed byinjecting different number of electrons. There are severaltechniques that are used to program or erase these cells [1].These include source side injection (SSI), Fowler-Nordheimtunneling (FN), channel hot electron injection (CHE). NANDFlash memories can store multiple bits per cell resulting inhigh storage density. For instance, a k-bit multi-level cell(MLC) cell stores k bits of data. This implies that 2k levelsof threshold voltage can be supported by a single cell. In thispaper, we focus on 2 bit MLC; the states for 2 bit MLC areshown in Fig. 1.

J Sign Process Syst

In a NAND Flash chip, the cell arrays are divided intoblocks where each block contains 32 to 256 pages and the sizeof each page is between 2 KB and 8 KB. Each page can befurther divided into two groups: even and odd. For two bitMLC memory, the two bits in each cell can be classified intomost significant bit (MSB) and least significant bit (LSB).Thus the data in a page can be organized into four sub-pages:MSB-even, LSB-even, MSB-odd and LSB-odd.

The three main operations in NAND Flash memory areerase, program (write) and read [1–3, 13]. The erase operationis done at the block level granularity, while the read and writeoperations are done at the page level granularity. For write, allcells in the same page connected to a word line are pro-grammed simultaneously. Multiple program-and-verify stepsare used to set the correct threshold voltage value. If the cell’sthreshold voltage is higher than the reference value, theprogram-and-verify iteration stops; otherwise, the cells areprogrammed again by increasing the programming voltage.Programming is done at the page level granularity.

3 NAND Flash Error Model

In this section, we first explain the error sources of NANDFlash memory (Section 3.1), followed by derivation of errorcharacteristics of each sub-page (Section 3.2).

3.1 Error Types

There are many sources of errors in MLC Flash memories.Since all the programmed levels must be allocated in apredetermined sized voltage window, the spacing betweenadjacent programmed levels is reduced, making MLC mem-ories less reliable. In addition, charged particles due to sunactivity or other ionization mechanisms [16] causes singleevent upsets (SEU).

There are two major types of errors inMLC Flashmemory:data retention error and program interference error [3, 13].Date retention (DR) error occurs when data stored in thememory cell changes due to gradual dissipation of the chargeprogrammed in the floating gate. DR error is dependent on the

number of program/erase (P/E) cycles. P/E operation physi-cally wears out the tunnel oxide of the floating gate bycharging traps into the oxide and interface states [17–20],and as a result the threshold voltage ofmemory cell is reduced.

Program interference (PI) error occurs when the thresholdvoltage of memory cells changes due to the cell-to-cell inter-ference from neighboring cells. This effect is due to parasiticcapacitance coupling [21] and it happens in every P/E opera-tion. In this paper, we consider the all bit-line structure, whereall the cells in the same wordline are read or written at thesame time. In contrast, in the even/odd bit-line structure, evencells and odd cells are read or written separately. While thisconfiguration has higher PI errors than the all bit-line struc-ture, all techniques proposed here are applicable to bothstructures.

In many systems, when retention time is long, DR errorsoutweigh PI errors. In this paper, we consider the case whenDR errors are reduced by data refresh. In such a case, PI errorscan be comparable to DR errors and cannot be ignored.

3.2 Error Characteristics

We begin by summarizing the key characteristics of PI andDR errors described in [13]. First, all types of errors increaseas the number of P/E cycles increases. Second, for fixednumber of P/E cycles, error rates of different types of errorsvary significantly. DR error rates grow as the data storage timeincreases, and DR errors dominate when the data storage timeis longer than 1 day. However when the data storage time isless than 1 day, PI errors dominate [13, 14].

Test results in [13] also show that the DR errors and PIerrors are value dependent; their flipping probabilities aredifferent for different logical states. Moreover, the probabili-ties are fairly constant over a large range of P/E cycles. Table 1lists the four highest error probabilities for DR and PI errors[13]. We see that for DR errors, 00->01 and 01->10 accountfor 90 % of the error events. Similarly for PI errors, 11->10and 10->01 account for 94 % of the errors. Notice that whilethe transitions, 00->01 and the 11->10, affect the LSBsubpages, the 01->10 transition affects both MSB and LSBsubpages. So, we propose re-mapping based on Gray code to

11L0

01L1

10L2

00L3

Increasing Charge

Increasing Threshold Voltage

Figure 1 Threshold voltagedistribution of 2-bit MLC Flash.

J Sign Process Syst

reduce the bit error rates in the different subpages. Remappingcauses the 01->10 transition to map to the 01->11 transitionand thus affecting the error rates of only the MSB subpages.

Due to different error transition probabilities, the error ratesof the four sub-pages are different. The results in [2, 13, 22]show that odd and even cells have different failure rates forDR and PI errors. We see from [13] that the DR error rate ofodd pages is always higher than that of the correspondingeven pages and that the error rate of MSB subpage is higherthan that of the corresponding LSB subpage. We use theresults presented in [13] to assume that the error rate ofLSB-odd subpage is, on average, 1.45 times larger than theerror rate of MSB-even subpage. We use this ratio to derivethe cell failure rate for even and odd pages.

Let the cell failure rate of even page due to DR be p1. Thenusing the results in [13], the cell failure rate of odd page is 2.5⋅p1. The bit error rate (BER) for each subpage can be derivedfrom the error probabilities listed in Table 1. For example, theB E R o f L S B - e v e n i s P r ( e r r o r i n L S B ) ⋅Pr(even cell failure rate)=(46%+44%+2%) ⋅p1=0.92 ⋅p1.Similarly, the BER of LSB-odd is Pr(error in LSB) ⋅Pr(odd cell failure rate)=0.92⋅2.5⋅p1. Since Gray code chang-es the mapping of states, it changes the sub-page error rates aswell. The error rates for each sub-page due to DR errors withand without Gray code are given in Table 2.

The cell failure rates of even cell and odd cell arequite different for PI errors. Previous research workdoes not explicitly address the differences between evencell and odd cell failure rates for PI errors. This isprobably because PI errors were considered less impor-tant compared to DR errors—a fact which is true if thedata storage time is long. However, PI errors cannot beignored when the retention time is short due to highaccess frequency or use of data refresh.

In [23], the simulated raw BER for even and odd cellshows that the ratio between even cell and odd cellBER varied from 4 to 50. We assume the error ratioin even cell is ‘a’ times higher than that of odd cell,

and that the error failure rate of odd cell is p2. Then theerror rates of four sub-pages are: 0.277 ⋅ap2 for MSB-even, 0.741 ⋅ap2 for LSB-even, 0.277 ⋅p2 for MSB-oddand 0.741 ⋅p2 for LSB-odd. The sub-page error ratesbefore and after Gray coding is given in Table 2.

From Table 2 we see that Gray coding helps reducethe error rates for both DR and PI errors in the LSB-even and LSB-odd sub-pages. This leads to more com-parable error rates for MSB-even and LSB-evensubpages as well as MSB-odd and LSB-odd subpages.This has two implications. First, the ECC can be oflower strength than before. Second, the ECC unit forMSB and LSB subpages can be the same.

Next we compute the average error rates for DR and PIerrors at the end of its lifetime. The lifetime of typical NANDFlash storage systems is at least 104 P/E cycles [1, 13, 14], sowe consider the lifetime to be 5 ⋅104 P/E cycles. For thisscenario, from [13, 14], the average BER of PI error is 2⋅10−6. The average BER of 1 day DR error is 2⋅10−6, 3 day DRerror is 1.8⋅10−5, 3 week DR error is 2⋅10−4, 3 month DRerror is 2⋅10−3 and 3 year DR error is 1.5⋅10−2. The averageerror rates for DR and PI errors are used to compute p1 and p2.For instance, the 1 day DR error rate of 2⋅10−6 is equal to thesumm of error rates of four sub-pages (see Table 2). Thus, 2⋅10−6=0.49 ⋅p1+0.92 ⋅p1+1.225 ⋅p1+2.3 ⋅p1 resulting inp1=4.05⋅10−7. We list the error rates for the four sub-pagesfor different DR times and PI cases after Gray coding for 5⋅104 P/E cycles in Table 3.

3.3 Choosing Appropriate ECC Code

Our goal is to find an ECC code that achieves an UBER=10−15

for every sub-page. Such an UBER is a reasonable target valuefor many storage systems [3, 24]. We propose to use a bit-levelECC code to reach this goal since NAND Flash errors, espe-cially after bit-level interleaving, are random SEUs. When DRtime is small, the DR errors are small and a low cost ECC isenough. As the DR time increases, the error rates become larger

Table 1 Error probabilities of DRand PI errors [13]. DR errors 00->01, 46 % 01->10, 44 % 01->11, 5 % 10->11, 2 % Other 3 %

PI errors 11->10, 70 % 10->01, 24 % 10->00, 2.2 % 11->01, 1.5 % Other 1.9 %

Table 2 Sub-page error rate be-fore and after Gray coding. MSB-even LSB-even MSB-odd LSB-odd

DR errors 0.49 ⋅p1 0.92⋅p1 1.225⋅p1 2.3⋅p1DR errors (after Gray coding) 0.49 ⋅p1 0.51⋅p1 1.225⋅p1 1.275 ⋅p1PI errors (a is 50) 13.85⋅p2 47.2⋅p2 0.277⋅p2 0.944 ⋅p2PI errors (after Gray coding) 13.85⋅p2 37.05⋅p2 0.277⋅p2 0.741 ⋅p2

J Sign Process Syst

and stronger BCH codes have to be used. Figure 2 plots UBERvs. raw BER obtained after Gray coding for several BCH codeswith 512 information bits. This figure helps us determine theBCH code that is required for the different sub-pages. Forinstance, if DR is 3 days, the MSB-even subpage has a BERof 2.52⋅10−6 (from Table 3) and a t=3 BCH code is sufficient.If the DR time increases to 3 weeks, then the MSB-evensubpage BER is as high as 2⋅10−5 and a t=5 BCH code isrequired to achieve UBER of 10−15.

4 Adaptive Data Refresh Technique

To eliminate DR errors in NAND Flash memory,remapping and in-place reprogramming based refresh

techniques have been proposed in [14]. In remappingbased refresh, the data of a whole block is read out,error corrected (if necessary) page by page and writteninto another empty block. The original block is erasedafter remapping and marked as empty. In in-placereprogramming, on the other hand, the decoded data iscompared with data read out from memory, and in caseof errors, additional programming operations are appliedin place to correct the errors. In-place reprogrammingrefresh is preferred in [14] since remapping based re-fresh increases the number of erase operations and thusreduces memory l i f e t ime . Howeve r, in -p l acereprogramming has its own problems. It cannot correctPI errors and instead introduces more PI errors due toadditional programming operations.

Table 3 Sub-page error rate fordifferent DR times and differentPI ratios (a) at 5⋅104 P/E cycles.

5 104 P/E cycles Raw BER BER after Gray coding

MSB-even LSB-even MSB-odd LSB-odd

PI (a is 4) 2.00E−6 4.35E−7 1.17E−6 1.04E−7 2.91E−7PI (a is 8) 2.00E−6 4.76E−7 1.27E−6 6.80E−8 1.82E−7PI (a is 50) 2.00E−6 5.34E−7 1.42E−6 1.00E−8 2.80E−8DR 1 day 2.00E−6 2.80E−7 2.91E−7 7.00E−7 7.29E−7DR 2 days 7.91E−6 1.11E−6 1.53E−6 2.76E−6 2.88E−6DR 3 days 1.80E−5 2.52E−6 2.62E−6 6.30E−6 6.55E−6DR 7 days 5.06E−5 7.08E−6 7.37E−6 1.77E−5 1.84E−5DR 3 weeks 2.00E−4 2.00E−5 2.91E−5 7.00E−5 7.28E−5

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-210

-20

10-18

10-16

10-14

10-12

10-10

10-8

Raw Bit Error Rate

Unc

orre

ctab

le B

it E

rror

Rat

e

P/E cycle is 5*104

t=2t=3t=4t=5t=6

Figure 2 BCH codes withdifferent error correctioncapabilities for 512 informationbits.

J Sign Process Syst

In this paper, we use both these techniques; the choice ofwhether we use in-place reprogramming or remapping isbased on the access frequency of the application. Some appli-cations have high access frequencies. For instance, file bench-marks Iozone [25] and postmark [26] have 20 and 5.5 P/Ecycles per block per day, respectively; others have low accessfrequencies such as trace web search which is 0.0005 P/Ecycles per block per day.

For applications with high P/E frequency (more thanonce per day), PI errors are dominant and the net BERis determined by the PI errors and cannot be reducedeven if the refresh frequency is higher than once perday. For such cases, we propose to use remappingbased refresh with regular data update. In regular dataupdate, data is copied from current block to anotherblock followed by erase of current block. Remappingbased refresh, when done along with data update, justadds another layer of ECC decoding and encoding,which has minimal effect on Flash memory perfor-mance and energy. This technique does not increasethe number of erase operations compared to regulardata update and thus does not introduce more PI errorsdue to refresh.

For applications with low P/E frequency (lower thanonce per day), the refresh frequency is higher than theP/E frequency. Here we propose to use in-placereprogramming. To guarantee that all blocks have beenrefreshed at a predetermined frequency 1/α, we cankeep the access record in system files and only refreshblocks that have had no P/E operations in α days. Theproposed adaptive refresh technique is shown in Fig. 3,and the effect of different refresh frequencies for differ-ent applications is given in Section 5.

Table 4 lists the BCH codes that can be used for thedifferent subpages for different refresh intervals. We use

the BER of DR and PI errors listed in Table 3 and thedecoding performance curves in Fig. 2 to determinethese codes. For instance, if the refresh interval is2 days, we can use BCH (542, 512, t=3) code for botheven and odd pages. However if the refresh intervalincreases to 1 week, we need stronger BCH codes suchas BCH (552, 512, t=4) for even pages and BCH (562,512, t=5) for odd pages. Thus a combination of BCHcode and refresh interval is done to achieve the reliabil-ity constraint of UBER=10−15.

5 Evaluation

5.1 Hardware Implementation of ECC Schemes

The ECC units listed in Table 4 have been synthesizedin 45 nm technology using Nangate cell library [27] andSynopsys Design Compiler [28]. The BCH decoders arepipelined versions of the simplified inverse-freeBerlekamp-Massey (SiBM) algorithm. The 2t-foldedSiBM architecture [29] is used to minimize the circuitoverhead of Key-equation solver at the expense of in-crease in latency. A parallel factor of 8 is used forsyndrome calculation and Chien search. The decodinglatency, energy and redundancy rates of the differentECC schemes presented in Table 4 are given in Table 5.For page size of 4 KB, each sub-page is 1 KB and sothere are two ECC units per subpage working on 512information bits in parallel.

Table 6 lists the energy and latency numbers ofNAND Flash memory using measured results of severalcommercial chips [30–32]. While the value of latencyand energy varies among different manufacturers andtechnologies, we picked the average values for a 4 KBpage NAND Flash memory in 45 nm technology. Notethat the energy values of the ECC unit shown in Table 5are significantly less than the Flash energy valuesshown in Table 6. Thus the memory energy is only

P/E cycles/day >1

Application

RemappingIn-place

reprogramming

Yes

No

Refreshinginterval αcontrol

α =1day Adaptiveinterval α

Figure 3 Flowchart of adaptive refresh technique.

Table 4 Combination of ECC schemes and refresh intervals to achieveUBER=10−15 for different pages.

Refresh interval Even page (MSB, LSB) Odd page (MSB, LSB)

1 day BCH (542, 512, t=3) BCH (542, 512, t=3)

2 days BCH (542, 512, t=3) BCH (542, 512, t=3)

3 days BCH (552, 512, t=4) BCH (552, 512, t=4)

1 week BCH (552, 512, t=4) BCH (562, 512, t=5)

3 weeks BCH (562, 512, t=5) BCH (572, 512, t=6)

J Sign Process Syst

affected by the additional storage that is required by theECC code.

5.2 System-Level Evaluation

5.2.1 Applications with P/E Frequency Higher than Once perday

For applications with P/E frequency higher than onceper day, we use remapping based refresh strategy andset the refresh interval to be once per day. In this case,the BER is caused by both PI error and DR error due todata retention of 1 day. From Table 4 we see that theECC scheme to handle these errors is BCH (542, 512,t=3) for both even and odd pages. Since this refreshstrategy causes little increase in latency and energy dueto extra read/write operations, the main overhead is dueto ECC schemes. However from Table 5 we see that theenergy and latency overhead of the BCH (542, 512, t=3) ECC unit is quite low and is significantly less thanthose of NAND Flash memory. Thus, the only overheadis the additional energy due to parity storage, which is5.8 % for both even and odd pages in this case.

5.2.2 Applications with P/E Frequency Less than Once perday

For applications with P/E frequency less than once per day, weuse in-place refresh strategy where the refresh interval is a

function of the P/E frequency. We evaluate the proposedrefresh strategy by considering two types of applications thatare borrowed from [14]. Application A has P/E frequency ofonce per 7 days and programming ratio (defined as number ofwrite/ total number of reads and writes) of 17 %. ApplicationB has P/E frequency of once per 200 days and programmingratio of 20 %.

As refresh interval increases, overhead due to refresh(read and re-programming energy) decreases. However,since the BER of DR errors increases, to achieve thesame UBER=10−15 at 5*104 P/E cycles, the requirederror correction capability of ECC code increases, whichtranslates to higher decoding latency and parity storage.The effect of increasing refresh interval for ApplicationA is shown in Fig. 4. While the ECC decoding latencyincreases as the refresh interval increases, it is actuallyvery small compared to the NAND Flash read latency.So increase in the ECC decoding latency does not affectthe system-level timing performance and thus latency isnot considered in the rest of the paper.

Next we evaluate the change in the additional energydue to increase in refresh interval time. Define normalizedadditional energy as the ratio of Eadditional over Ebaseline,where Ebaseline is the energy without refresh and ECC.Ebaseline = Eread*Nread + Eprogramming*Nprogramming whereNread and Nprogramming are the number of read and writeoperations. Eadditional = Erefresh + Eparity where Erefresh is theadditional energy resulting from refresh and Eparity is theenergy due to accesses to a large memory given by Eparity =Ebaseline * redundancy rate. Ignoring the energy of ECC unit,Erefresh can be represented as Erefresh ≈ (Ebaseline )*(frefresh/fP/E),where frefresh/fP/E is the ratio of refresh frequency over P/Efrequency.

As shown in Fig. 4, as refresh interval increases from 1 dayto 3 days for Application A, normalized additional energy ofboth even and odd pages decrease. Figure 4 also shows thatfor both even and odd pages, refresh interval of 3 days ispreferred if reducing energy is more important and refresh

Table 5 Decoding latency, ener-gy and redundancy rate of ECCschemes corresponding toTable 4.

Results are given as decoding la-tency (ns)/energy (pJ)/redundan-cy rate. Critical path is 0.65 ns forBCH (542, 512, t=3), BCH (552,512, t=4) and BCH (562, 512, t=5)

Refresh interval Even page Odd page

Latency(ns)

Energy(pJ)

Redundancyrate

Latency(ns)

Energy(pJ)

Redundancyrate

1 day 89.1 169.6 5.8 % 89.1 169.6 5.8 %

2 days 89.1 169.6 5.8 % 89.1 169.6 5.8 %

3 days 94.1 225.2 7.8 % 94.1 225.2 7.8 %

1 week 94.1 225.2 7.8 % 100.0 292.4 9.8 %

3 weeks 100.0 292.4 9.8 % 107.2 380.0 11.7 %

Table 6 Latency and energy of 4 KB page NAND Flash in 45 nmtechnology.

Programming Read Erase

Latency (us) 520 35 2050

Energy (uJ) 65 2.1 30

J Sign Process Syst

interval of 1 day is preferred if low ECC decoding latency ismore important.

A more detailed analysis of additional read energy, writeenergy and parity storage energy due to refresh for Applica-tion A is given in Table 7. We find that as refresh intervalincreases, read energy due to refresh is constant at around1.3 %. However, write energy due to refresh decreases from96.5 to 92.8 % while the parity storage energy increases from2.2 to 6.0 %. This is because long refresh interval resultsin use of stronger ECC that increase both the decodinglatency and also the parity storage. Overall the energydecreases and so for Application A, longer refresh in-terval is always the best choice. Since Application Ahas P/E frequency of once per 7 days, the longestrefresh interval is 3 days which corresponds to ECCcode of t=4 for even and odd pages.

Similar analysis has been done for Application B.Here too the normalized additional energy of both evenand odd pages decrease as refresh interval increases.Since Application B has P/E frequency of once per

200 days, the longest refresh interval can be as longas 100 days. However such a long refresh intervalwould mandate a strong ECC code with high redundan-cy rate. From Table 4 we see that to guarantee UBER=10−15, if refresh interval is 3 weeks, we have to use aBCH t=6 code with parity storage of 11.7 %. A refreshinterval longer than 3 weeks would increase the paritystorage to be over the 12.5 % constraint mandated inmemory systems and thus not acceptable. Thus, for lowenergy applications, when the P/E frequency is less thanonce per day, the refresh interval is chosen to be halfthe P/E interval provided that it does not violate theadditional memory constraint due to parity bit storage.

Through this analysis we have established that for agiven reliability constraint (UBER=10−15 in this pa-per), for each application there is an optimal combina-tion of ECC correction capability (t) and refresh inter-val (τ) that achieves the lowest memory energy. MostFlash memory chips come with a fixed ECC unit. Ifthis ECC unit has higher error correction capabilitythan the optimal t, then the refresh interval remains atτ and the UBER constraint can be guaranteed with thelowest memory energy. However, if this ECC unitprovides lower error correction capability than the op-timal t, the refresh interval has to be reduced to satisfythe UBER constraint and the memory energy is nolonger minimum.

6 Conclusion

In this paper we utilize the error characteristics of DRand PI errors provided in [13, 14] to develop low cost

1day 2days 3days2

3

4

5

6

7

8

Refresh Interval

Additional energy of even page Additional energy of odd page Decoding latency of even page Decoding latency of odd page

Nor

mal

ized

Add

ition

al E

nerg

y

10

20

30

40

50

60

70

80

90

100

EC

C D

ecod

ing

Late

ncy(

ns)

Figure 4 Effect of differentrefresh intervals for ApplicationA. Additional energy isnormalized to the baseline energythat does not include energy dueto refresh and ECC energy.

Table 7 Additional energy distribution of refresh technique for differentrefresh intervals (Application A).

Additionalenergy due to

1 day 2 days 3 days

Evenpage

Oddpage

Evenpage

Oddpage

Evenpage

Oddpage

Read 1.3 % 1.3 % 1.3 % 1.3 % 1.2 % 1.2 %

Write 96.5 % 96.5 % 96.3 % 96.3 % 92.8 % 92.8 %

Parity storage 2.2 % 2.2 % 2.4 % 2.4 % 6.0 % 6.0 %

J Sign Process Syst

error correction techniques that use a combination ofdata refresh policies and BCH based ECC schemes toachieve low UBER. However, this method is quitegeneral and can be used for other NAND Flash memo-ries as well. First, we use Gray coding and bit-levelinterleaving to reduce the error rates. We find that thisresults in comparable error rates for MSB and LSBsubpages of odd and even pages and enables thesubpages to share the same ECC unit resulting in lowhardware overhead. Next we use different data refreshpolicies to reduce the DR errors. For applications withP/E frequency higher than once per day, we propose touse remapping based refresh during regular data up-dates, set the refresh interval to be once per day anduse BCH t=3 codes for all pages. For applications withP/E frequency lower than once per day, we use in-placereprogramming based refresh where the refresh intervalis chosen based on the system requirements. For in-stance, to achieve UBER=10−15 at 50 K P/E cycles, ifthe P/E frequency is once per week, we use BCH (552,512, t=4) for both even and odd pages with refreshinterval of 3 days to achieve the lowest memory energy.

References

1. Micheloni, R., et al. (2009). Non-volatile memories for removablemedia. Proceedings of the IEEE, 97(1), 148–160.

2. Grupp, L.M, et al. (2009) Characterizing FlashMemory: Anomalies,Observations, and Applications. International Symposium onMicroarchitecture, pp. 24–33.

3. Mielke, N., et al. (2008). Bit Error Rate in NAND Flash Memories.46th Annual International Reliability Physics Symposium, IEEECFP08RPS-CDR, Phoenix.

4. Desnoyers, P. (2010). Empirical evaluation of NAND Flash memoryperformance. Symposium on Operating Systems Principles. SIGOPSOperating Systems Review, 44(1), 50–54.

5. Rossi, D., & Metra, C. (2003). Error correcting strategy for highspeed and high density reliable flash memories. Journal of ElectronicTesting: Theory and Applications, 19(5), 511–521.

6. Choi, H., Liu, W., & Sung, W. (2010). VLSI implementation of BCHerror correction for multilevel cell NAND Flash memory. IEEETransactions on Very Large Scale Integration (VLSI) Systems,18(5), 843–847.

7. Chen, T., Hsiao, Y., Hsing, Y., & Wu, C. (2009). An Adaptive-RateError Correction Scheme for NAND Flash Memory. 27th IEEE VLSITest Symposium, pp. 53–58.

8. Li, S., & Zhang, T. (2010). Improving multi-level NAND Flashmemory storage reliability using concatenated BCH-TCM coding.IEEE Transactions on Very Large Scale Integration (VLSI) Systems,18(10), 1412–1420.

9. Maeda, Y., & Kaneko, H. (2009). Error Control Coding forMultilevel Cell Flash Memories Using Nonbinary Low-DensityParity-Check Codes. IEEE International Symposium on Defect andFault Tolerance in VLSI Systems, pp. 367–375.

10. STMicroelectronics, ST72681, USB 2.0 high-speed Flash drive con-troller, http://www.st.com/stonline/books/pdf/docs/11352.pdf

11. XceedIOPS SATA SSD, SMART’s Storage Solutions. www.smartm.com/files/salesLiterature/storage/xceediops_SATA.pdf

12. Yang, C., Emre, Y., & Chakrabarti, C. (2012). Product code schemesfor error correction in MLC NAND Flash memories. IEEETransactions on Very Large Scale Integration (VLSI) Systems,20(12), 2302–2314.

13. Cai, Y., Haratsch, E. F.,Mutlu, O., &Mai, K. (2012). Error patterns inMLC NAND Flash memory: measurement, characterization, andanalysis. Design, Automation & Test in Europe Conference &Exhibition (DATE), pp. 521–526.

14. Cai Y., et al. (2012). Flash Correct-and-Refresh: Retention-AwareError Management for Increased Flash Memory Lifetime. ComputerDesign (ICCD), 2012 I.E. 30th Internationl Conference.

15. Yang, C., Muckatira, D., Kulkarni, A., & Chakrabarti, C. (2013).Data storage time sensitive ECC schemes for MLC NAND Flashmemories. International Conference on Acoustics, Speech and SignalProcessing ICASSP, pp. 2513–2517.

16. Wrobel, F., et al. (2001). Simulation of nucleon-induced nuclearreactions in a simplified SRAM structure: scaling effects on SEUand MBU cross sections. IEEE Transactions on Nuclear Science,48(6), 1946–1952.

17. Olivo, P., Ricco, B., & Sangiorgi, E. (1986). High-field-inducedvoltage-dependent oxide charge. Applied Physics Letters, 48, 1135.

18. Cappelletti, P., Bez, R., Cantarelli, D., & Fratin, L. (1994) Failuremechanisms of flash cell in program/erase cycling. In: Proc. Int.Electron Devices Meet., pp. 291–294.

19. Kurata, H., et al. (2007). Random telegraph signal in Flash memory:its impact on scaling of multilevel flash memory beyond the 90-nmnode. IEEE Journal of Solid-State Circuits, 42(6), 1362–1369.

20. Mielke, N., et al. (2004). Flash EEPROM threshold instabil-ities due to charge trapping during program/erase cycling.IEEE Transactions on Device and Materials Reliability, 4(3),335–344.

21. Lee, J. D., Hur, S. H., & Choi, J. D. (2002). Effects of floating-gateinterference on NAND flash memory cell operation. IEEE ElectronDevice Letters, 23(5), 264–266.

22. Tanakamaru, S., et al. (2011). 95%-Lower-BER 43%-Lower-PowerIntelligent Solid-State Drive (SSD) with Asymmetric Coding andStripe Pattern Elimination Algorithm. International Solid StateCircuit Conference, ISSC, pp. 204–206.

23. Dong, G., Li, S., & Zhang, T. (2012). Using data postcompensationand predistortion to tolerate cell-to-cell interference in MLC NANDFlash memory. IEEE Transactions on Circuits and Systems, 75(10),2718–2728.

24. Gray, J., & van Ingen, C. (2005). Empirical Measurements of DiskFailure Rates and Error Rates. Microsoft Research Technical ReportMSR-TR-2005-166.

25. IOzone.org, “IOzone Filesystem Benchmark,” http://iozone.org26. Katcher, J. (1997). Postmark: a New File System Benchmark

Technical Report.27. Nangate, Sunnyvale, California, 2008, “45 nm open cell library”,

http://www.nangate.com/28. Synopsys Design Compiler: http://www.synopsys.com29. Liu W., Rho, J., & Sung, W. (2006). Low-Power High-Throughput

BCH Error Correction VLSI Design for Multi-Level Cell NANDFlash Memories. IEEE Workshop on Signal Processing SystemsDesign and Implementation, pp. 303–308.

30. Nobunaga, D., et al. (2008). A 50 nm 8 Gb NAND Flash Memorywith 100 MB/s Program Throughput and 200 MB/s DDR Interface.International Solid State Circuit Conference, ISSC, Session 23, pp.623–625.

J Sign Process Syst

http://www.st.com/stonline/books/pdf/docs/11352.pdf

http://www.smartm.com/files/salesLiterature/storage/xceediops_SATA.pdf

http://www.smartm.com/files/salesLiterature/storage/xceediops_SATA.pdf

http://iozone.org/

http://www.nangate.com/

http://www.synopsys.com/

31. Zhang, R., et al. (2009). A 172 mm2 32 GB MLC NAND FlashMemory in 34 nm COMS. International Solid State CircuitConference, ISSC, Session 13, pp. 429–431.

32. Berwer. J., & Gill, M. (2008). Nonvolatile Memory Technologieswith Emphasis on Flash. IEEE Press Series on MicroelectronicSystem.

Chengen Yang received the B.S.degree inmicroelectronics fromPe-king University, Beijing, China, in2006 and the M.S. degree in elec-trical engineering from the InstituteofMicroelectronics, ChineseAcad-emy of Sciences, Beijing, China, in2009. He is currently a Ph.D. can-didate in Arizona State University,Tempe. His research interests in-clude error control algorithm andimplementation for non-volatilememories, system level memoryarchitecture design for low powerand high performance storage.

Hsing Min Chen received hisbachelor’s degree in the computerscience in 2007 andmaster degreeof network engineering in 2009,both from National Chiao TungUniversity (NCTU). Currently,he is a PhD student in ElectricalEngineering Arizona State Uni-versity. He worked on the lowpower, low latency and high per-formance memory design on errorcorrection code.

Trevor Mudge received thePh.D. degrees in Computer Sci-ence from the University of Illi-nois, Urbana in 1977. Since then,he has been on the faculty of theUniversity of Michigan, Ann Ar-bor. In 2003 he was named thefirst Bredt Family Professor ofElectrical Engineering and Com-puter Science after concluding a10 year term as the Director of theAdvanced Computer ArchitectureLaboratory—a group of 8 facultyand about 60 graduate students.He is author of numerous papers

on computer architecture, programming languages, VLSI design, andcomputer vision. He has also chaired about 50 theses in these areas. Hisresearch interests include computer architecture, computer-aided design,and compilers. In addition to his position as a faculty member, he runsIdiot Savants, a chip design consultancy. TrevorMudge is a Fellow of theIEEE, a member of the ACM, the IET, and the British Computer Society.

Chaitali Chakrabarti receivedthe B.Tech. degree in electronicsand electrical communication engi-neering from the Indian Institute ofTechnology, Kharagpur, India, in1984, and the M.S. and Ph.D. de-grees in electrical engineering fromthe University of Maryland, Col-lege Park, in 1986 and 1990, re-spectively. She is a Professor withthe Department of Electrical Com-puter and Energy Engineering, Ar-izona State University (ASU),Tempe and a Fellow of the IEEE.Her research interests include the

areas of low power embedded systems design including memory optimiza-tion, high level synthesis and compilation, and VLSI architectures andalgorithms for signal processing, image processing, and communications.She is currently an Associate Editor of the Journal of VLSI Signal Process-ing Systems and the IEEE Transactions of VLSI Systems.

J Sign Process Syst

improving the reliability of mlc nand flash memories through adaptive data refresh and error control...

Documents