does raid improve lifetime of ssd arrays?cesg.tamu.edu/wp-content/uploads/2012/02/a11-moon.pdfdata...

11

Does RAID Improve Lifetime of SSD Arrays?

SANGWHAN MOON and A. L. NARASIMHA REDDY, Texas A&M University

Parity protection at the system level is typically employed to compose reliable storage systems. However,careful consideration is required when SSD-based systems employ parity protection. First, additional writesare required for parity updates. Second, parity consumes space on the device, which results in write ampli-fication from less efficient garbage collection at higher space utilization.

This article analyzes the effectiveness of SSD-based RAID and discusses the potential benefits and draw-backs in terms of reliability. A Markov model is presented to estimate the lifetime of SSD-based RAIDsystems in different environments. In a small array, our results show that parity protection provides benefitonly with considerably low space utilizations and low data access rates. However, in a large system, RAIDimproves data lifetime even when we take write amplification into account.

Categories and Subject Descriptors: D.4.2 [Operating Systems]: Storage Management; B.8.1 [Perfor-mance and Reliability]: Reliability, Testing, and Fault-Tolerance

General Terms: Reliability

Additional Key Words and Phrases: Flash memory, SSD, write amplification, RAID, lifetime, MTTDL

ACM Reference Format:Sangwhan Moon and A. L. Narasimha Reddy. 2016. Does RAID improve lifetime of SSD arrays? ACM Trans.Storage 12, 3, Article 11 (April 2016), 29 pages.DOI: http://dx.doi.org/10.1145/2764915

1. INTRODUCTION

Solid-state drives (SSDs)1 are attractive as a storage component due to their highperformance and low power consumption. Advanced integration techniques, such asmultilevel cell (MLC), have considerably dropped the cost per bit of SSDs such thatwide deployment of SSDs is feasible [Grochowski 2012; Grupp et al. 2012]. While SSDdeployment is steadily increasing, its write endurance still remains one of the mainconcerns.

Many protection schemes have been proposed to improve the reliability of SSDs. Forexample, error correcting codes (ECCs), log-like writing of flash translation layer (FTL),garbage collection, and wear leveling improve the reliability of SSDs at the device level.Composing an array of SSDs and employing system-level parity protection is one ofthe popular protection schemes at the system level. In this article, we study striping(RAID0), mirroring (RAID1), and RAID5.

RAID5 has been employed to improve the lifetime of hard disk drive (HDD)–basedstorage systems for decades [Patterson et al. 1988]. However, careful decisions should

1We consider MLC SSDs in this article, unless otherwise mentioned.

Authors’ addresses: S. Moon and A. L. N. Reddy, Department of Electrical and Computer Engineering, TexasA&M University, College Station, TX 77843-3259; emails: {sangwhan, reddy}@tamu.edu.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrights forcomponents of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of thiswork in other works requires prior specific permission and/or a fee. Permissions may be requested fromPublications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© 2016 ACM 1553-3077/2016/04-ART11 $15.00DOI: http://dx.doi.org/10.1145/2764915

ACM Transactions on Storage, Vol. 12, No. 3, Article 11, Publication date: April 2016.

http://dx.doi.org/10.1145/2764915

http://dx.doi.org/10.1145/2764915

11:2 S. Moon and A. L. N. Reddy

be made with SSDs when system-level parity protection is employed. First, SSDs havelimited write endurance. Parity protection results in redundant writes whenever awrite is issued to the device array. Unlike HDDs, redundant writes for parity updatecan severely degrade the lifetime of SSDs. Parity data consumes device capacity andincreases space utilization. While it has not been a serious problem in HDDs, increasedspace utilization leads to less efficient garbage collection, which, in turn, increases thewrite workload.

Many studies have investigated SSD-based RAID systems. A notable study [Jeremicet al. 2011] points out the pitfalls of SSD-based RAID5 in terms of performance. Theydiscuss the behavior of random writes and parity updates, and conclude that strip-ing provides much higher throughput than RAID5. We consider the impact of writeworkload on reliability. Previous studies [Greenan et al. 2009; Soundararajan et al.2010] have considered different architectures to reduce the parity update performancepenalty.

This article focuses its attention on the reliability of an array of SSDs. We explorethe relationship between parity protection and the lifetime of an SSD array, makingthe following significant contributions:

—We analyze the lifetime of SSDs, taking benefits and drawbacks of parity protectioninto account.

—The results from our analytical model show that RAID5 can be less reliable thanstriping with a small number of devices because of write amplification.

—Our results show that RAID5 is effective at improving lifetime in large arrays andin small arrays with lower space utilizations and less intensive workloads.

—We explore device parameters that can improve lifetime and find effective configura-tions, improving data lifetimes in SSD arrays.

Section 2 provides a background of system-level parity protection and write amplifi-cation of SSDs. Section 3 explores SSD-based RAID. Section 4 builds a reliability modelof SSD-based RAID. Section 5 describes our simulator to estimate the lifetime of anSSD. Section 6 presents results from our analytic model. Section 7 introduces relatedwork. We present our conclusions in Section 8.

2. BACKGROUND

We categorize protection schemes for SSDs into two levels: device-level protectionand system-level protection. Device-level protection includes ECC, wear leveling, andgarbage collection [Gal and Toledo 2005]. System-level protection includes RAID5 andmirroring. In this article, we will mostly focus on system-level protection.

2.1. System-Level Protection

In many cases, device-level protections are not enough to protect data. For example,when the number of bit errors exceeds the number of correctable bit errors using ECC,data in a page may be lost without additional protection mechanisms. A device can faildue to other reasons, such as the failure of device attachment hardware. In this article,we call the former a page error and the latter a device failure. In order to protect devicefailures, system-level parity protection is employed.

RAID5 is popular, as it spreads workload well across all the devices in the devicearray with relatively small space overhead for parity protection. The group of datablocks and the corresponding parity block is called a page group. RAID5 is resilient toone device failure or one page error in a page group.

Mirroring is another popular technique to provide data protection at the system level.Two or more copies of the data are stored such that a device-level failure does not leadto data loss unless the original and all the replicas are corrupted before the recovery


Does RAID Improve Lifetime of SSD Arrays? 11:3

from a failure is completed. When the original data is updated, the replicas have to beupdated as well. Read operations can be issued to either the original or the replicas atthe system level. When a device is corrupted, the replicas are used to recover the faileddevice.

2.2. Write Amplification

Protection schemes for SSDs often require additional writes; those writes, in turn,reduce the reliability of SSDs. Since higher write amplification can reduce the lifetimeof an SSD severely, protection schemes should be configured carefully to maximizethe lifetime improvement while minimizing write amplification. Write amplificationseverely degrades reliability, since the reliability is highly dependent on the number ofwrites done at the SSDs. Main sources of the write amplification are discussed in thissection.

Recovery Process. In most of the recovery processes, at least one write is required towrite a corrected page. ECCs can correct a number of errors in a page simultaneouslywith one write. For instance, fixing a bit error in a page takes one redundant write,while fixing ten bit errors in a page also needs one additional write. It is noted thatECC-based write amplification is a function of read workload, unlike other sources ofwrite amplification.

Garbage Collection. NAND flash memory is typically written in a unit of a page,4KB, and erased in a unit of a block, for example, 512KB. It does not support in-placeupdates and accumulates writes in a log-like manner. In such a log-structured system,internal fragmentation and the garbage-collection process to tidy the fragmented dataare inevitable. The garbage-collection process moves valid pages from one place toanother place, which results in increasing the number of writes issued to the device.Write amplification due to garbage collection is dependent on the space utilization ofthe SSD. When the SSD is nearly full, garbage collection initiates quicker and resultsin being less efficient since a larger fraction of the blocks are still live. A recent study[Desnoyers 2012] reveals that the efficiency of garbage collection is also dependent onthe hotness of data. Since hot data tends to be more frequently invalidated, garbagecollection can be more efficient when hot workload concentrates on a small portion ofdata.

3. SSD-BASED RAID

When data is distributed across many devices, the failure of any device can lead todata loss. As the number of devices increases, the failure rate increases and lifetimedecreases, without additional measures for protecting data. To overcome this limita-tion, RAID is widely used in storage systems. Due to the physical characteristics ofSSDs, RAID architecture needs to be modified for SSD-based storage systems. Ouranalysis is based on an architecture shown in Figure 1, in which a RAID controlleroperates on top of a number of SSDs. As a result, the set of pages within a page grouphave a constant logical address (before FTL translation) within the device. As the pagesare written, the actual physical addresses of pages within the device change becauseof FTL translation. However, this does not impact the membership of the page group,based on the logical addresses. When the number of page errors in a page group exceedsthe number of correctable page errors by RAID, the page group fails and the storagesystem loses data.

In RAID5, for a small write, the data block and parity block need to be updated, po-tentially resulting in a write amplification factor of 2. However, when a large write thatspans a number of devices is issued, the parity block can be updated once for updating



Fig. 1. RAID architecture of an SSD array.

Table I. Raw Bit Error Rate λ(x) = AeBx

Error Type A BRead disturb (per 1–1,000 reads) 7.73e-7 2.17e-4

Data retention (per month) 3.30e-6 1.83e-4Write error (per write) 1.09e-7 3.01e-4

N − 1 data blocks, where N is the number of devices in the device array, resulting ina write amplification factor of N/(N − 1). Depending on the workload mixture of smalland large writes, the write amplification will be somewhere in between. In a mirroringsystem, the write amplification is 2 regardless of the size of the write request.

Parity pages increase space utilization. Suppose that 120GB of data is stored infour 80GB SSDs, RAID5 stores 30GB of data, and 10GB of parity in each device whilestriping stores only 30GB of data per device. The increased amount of space utilizationresults in less efficient garbage collection and more write amplification, which decreasesthe lifetime of SSDs.

It is possible to write to the array one stripe at a time to reduce parity update costs.However, this requires garbage collection to consider one stripe at a time, not a block ata time within a device, requiring coordination across the devices. While this is possiblewhen a RAID controller is built on top of raw flash memory, this is not feasible whenbuilding RAID on top of SSDs. In this article, we consider an array built on top of SSDs.

Additional measures can be employed to improve reliability. In order to protect datathat may not be frequently accessed in normal workloads, data on the SSDs are activelyscanned and errors found are scrubbed. Data scrubbing [Saleh et al. 1990; Schwarzet al. 2004] can be done during the idle periods of the SSD array. Scrubbing rate can beeither constant or exponentially distributed. When a large portion of data on the SSDare cold data, scrubbing is essential for data protection, though it consumes energy forscanning the SSD.

4. LIFETIME MODEL

Our lifetime model is based on a Markov model of our previous work [Moon and Reddy2012, 2013] which analyzes the relationship between the reliability of SSD and protec-tion schemes. A number of symbols used in our model are shown in Table II.

Many studies [Cai et al. 2012; Grupp et al. 2009; Sun et al. 2011; Yaakobi et al. 2010,2012] have investigated the error behavior of MLC flash memory. There are differentsources of bit error of flash memory: read disturb, data retention failure, and writeerror. These studies model the bit error rate as an exponential function of the numberof program/erase cycles (P/E cycles) that the cell has gone through. We employ the datafrom the measurement study of [Sun et al. 2011] to model the change of bit error rate.Table I shows parameters of the employed model.



Table II. List of Symbols Used in Analysis Models

Symbols Descriptionk the expected number of read operations done to neighboring pagesR the portion of reads in workloadW the portion of writes in workloadS the number of bits in a pageK the number of correctable bit errors by ECC (per page)

λ(x) raw bit error rate at x P/E cyclesg(i, x) page access rate by garbage collection for a page with i bit errors at x P/E cycles

μ page access rate by workloadμr page access rate by read workloadμw page access rate by write workloadFE (state) page error: ECC cannot recover the corrupted pageN the number of devices in an SSD array

ν(x) uncorrectable page error rate at x P/E cyclesν(i)(x) ν(x) of the ith replaced device

d device failure rateξ page group recovery rateη device recovery ratevi (state) a page is corrupted (i devices have been replaced for recovery)di (state) a device is corrupted (i devices have been replaced for recovery)FR (state) data loss: RAID cannot recover the system

σ (x) data loss rate at system levelp(x) the probability of data loss at x P/E cycles, induced by σ (x)w the number of writes issued per page per secondtw the average time to write whole storage system oncePi the probability of staying at state i in a Markov model

P(x) the probability of seeing the first data loss at x P/E cyclesus device space utilizationT the number of pages in a devicea overprovisioning factor, a = 1/usB the number of pages per block

Mv the number of blocks full of valid pages found in garbage collectionmv the probability of a block full of valid pages found in garbage collectionAF the write amplification from garbage collection with FIFO policyAF ′ the write amplification from garbage collection with modified FIFO policyAG the write amplification from garbage collection with Greedy policy

AF (a, r, f ) AF considering overprovisioning (a), hot workload (r) and hot space (f)AG(a, r, f ) AG considering overprovisioning (a), hot workload (r) and hot space (f)

q average write length in pagesW random variable of write length, exponentially distributed with average q

Read disturb error rate is a function of both P/E cycles and the number of reads onneighbors [Sun et al. 2011]. The number of read operations done to neighbors, k, canbe estimated as a function of read ratio in workload (R) and the number of neighborpages (n):

k =∞∑

i=0

pi · E(i, R, n)

= nR/(1 − R2)

, (1)

where pi is the probability of being read i times repeatedly without updating the targetpage, and E(i, R, n) is the expected number of read operations done to neighbor pageswhen the target page is read i times. By the time a page is accessed for the first time,Rn neighboring pages are expected to be read. A neighboring page that is written will



Fig. 2. A Markov model of page error rate.

be stored elsewhere, and that physical neighbor becomes invalid. If the second accessto a page is also a read (with a probability R2), it is expected R2 · Rn neighboring pageswould be read in that time. Likewise, we can estimate the number of additional readswhen the page is read i times repeatedly. In the equation, the probability of a pagebeing read i times is pi = Ri, and the expected number of additional read operationsdone to neighbor pages when the page is read i times is E(i, R, n) = R · E(i − 1, R, n).

While read disturb error rate is expected to grow with the number of reads, the datain [Sun et al. 2011] shows that this error rate is nearly constant for the first 1,000reads. Our evaluations are done within the range. For example, k is 218 when n is 127(the number of pages per block is 128), and R is 0.75 (Read:Write=3:1 in workload).

Uncorrectable Page Error Rate. We employ a Markov model in Figure 2 to build amodel of reliability of a page. In Figure 2, labels of states are the number of bit errorsin a page, and K is the number of correctable bit errors by ECC. Bit errors accumulatewithin a page at a rate of (S − i)λ(x), where S is the number of bits per page, i isthe number of errors accumulated in a page, and λ(x) is the raw bit error rate at xP/E cycles. Once a state gets to the state FE, bit errors cannot be recovered by ECC,which results in page error. ECC detects and corrects bit errors at page recovery rateμ. The μ is a function of page access time (or error detection time) t1 and page recoverytime t2:

μ = 1t1 + t2

. (2)

The bit error rate varies with the number of P/E cycles and can be considered bymodeling a series of Markov models with different bit error rates. We assume thatthe time-varying nature of λ(x) is relatively small because our raw bit error rate modelincreases error rate by about only 0.02% per P/E cycle. Also, we assume that the averagetime per P/E cycle is sufficiently large so that the Markov model converges in each P/Ecycle. Since wear-leveling scatters write and erase operations over the whole device,an average time per P/E cycle is approximately the amount of time taken to write thewhole device once. Under those assumptions, we can treat λ(x) as constant at each P/Ecycle x in the series.

We can estimate the uncorrectable page error rate at each P/E cycle using a steady-state analysis of the Markov model. From the analysis, the probability of reaching thestate FE can be obtained.

The write amplifications from various sources are not independent of each other.For example, when the garbage-collection process moves valid pages to empty place inanother block, bit errors in the pages are corrected by ECC during the move operation.We take such dependencies into account in our model. Each state of Markov modelin Figure 2 has its own probability of recovery by garbage collection. The garbage-collection recovery rate g(i, x) highly depends on garbage-collection policy adopted.



Fig. 3. A Markov model of the RAID5 system.

Data Loss Rate. The reliability of an SSD array (RAID5) can be modeled through aMarkov model shown in Figure 3.

There are mainly two sources of page error. Bit errors accumulate and result in apage error. The entire device can fail by various sources, such as attached hardwareerror, software error, and human error. The device failure leads to the failure of all thepages in the device.

Figure 3 shows the Markov model for a RAID5 system. Two sources of failure areconsidered in this model. Bit errors accumulate and result in a page error. The pageerror results in data loss unless detected and corrected before another page error in thesame page group or a device failure occurs. The device failure can lead to the failure ofall the pages in the device. The device failure results in data loss when a page error inanother device or another device failure occurs.

This is the list of cases when permanent data loss can occur:

(1) Page error + page error in the same page group.(2) Any page error + device failure.(3) Device failure + any page error in another device.(4) Device failure + another device failure.

In Figure 3, cases (1) and (2) are considered in the upper chain (from state i to vi andthen FR) of the Markov model, and cases (3) and (4) are considered in the lower chain(from state i to di, then FR).

Note that device failures are easier to detect and a recovery operation can be issuedimmediately soon after the failure. However, the device recovery time is considerablylong. On the other hand, a page error is not detected until that page is read. However,a detected page error can be recovered quickly. Our analysis takes these issues intoconsideration with different values of ξ and η.

The reliability of an SSD is highly dependent on the number of writes it has serviced.The Markov model in Figure 3 considers replaced devices after device failure recovery.The replaced devices have experienced a relatively lower number of writes.

The data loss rate σ (x) is the rate of reaching state FR. The time series of these dataloss rate σ (x) is integrated over time to obtain the expected time to data loss.

The data loss rate σ (x) = us · T · v(x) + d for a single SSD. For N device striping,σ (x) = N(usT · v(x) + d). Under an assumption that each device pair is independent ina mirroring system, we can build a Markov model for the device pair, then extend itto cover the entire system. The pair fails with a failure rate of ρ(x), and the mirroring



system fails at the rate of σ (x) = N/2 · ρ(x).

Single SSD : σ (x) = us · T · v(x) + dN device striping : σ (x) = N(usT · v(x) + d)Mirroring : σ (x) = N/2 · ρ(x)

(3)

Mean Time to Data Loss. We employ the mean time to data loss (MTTDL), which isone of the popular metrics in reliability analysis. The MTTDL is defined as the expectedtime to encounter the first data loss:

MTTDL =( ∞∑

x=1

(xPDL(x)

)) · tw

=( ∞∑

x=1

(xp(x)

x−1∏i=1

(1 − p(i))

))· tw

. (4)

In Equation (4), p(i) is the probability of data loss at i P/E cycles such that theprobability of seeing the first data loss at x P/E cycles PDL(x) is:

PDL = p(x)x−1∏i=1

(1 − p(i)). (5)

Thus, the MTTDL in Equation (4) is the sum of the expectation of seeing the firstdata loss at a certain P/E cycle over the lifetime of an SSD array.

4.1. Write Amplification

Write amplification can be caused by garbage collection, ECC recovery, and parityupdate, which are denoted as αgc, αrcv, and αparity.

4.1.1. Garbage Collection. The write amplification from garbage collection αgc relies ongarbage-collection policy. In this article, we consider two garbage-collection policies:First In First Out (FIFO) and Greedy. FIFO policy selects the least recently writtenblock as a victim for garbage collection, while Greedy policy picks the block with theleast number of valid pages as a candidate for garbage collection. We exploited theresult of [Desnoyers 2012] to estimate garbage-collection rate g(i, x), which is dependenton the write workload.

FIFO. This policy keeps a list of nonempty blocks in order of written time. We startfrom the simple case when the workload writes one page per write, and it is randomand uniformly distributed. The rate at which blocks are consumed is given by qAF/B.The write amplification from garbage collection for FIFO policy AF is the solution ofthe following equation. The details are described in [Desnoyers 2012].

AF = 1

1 − (1 − q·a

T

) T/BqAF /B

(6)

Real workload can write more than a page per operation, which generates consec-utive pages that are more likely to be invalidated together. When we write a largefile sequentially and delete the file at once, for example, many blocks will be full ofinvalidated pages and garbage collection can get a number of empty blocks, free ofwrite amplification. On the other hand, a large file can result in a series of blocks filledwith all valid pages. It is unrealistic that FIFO policy selects a victim block filled withvalid pages and moves all of the valid pages to somewhere in the SSD during garbage



Fig. 4. An example of a large write.

collection process. (This will not generate empty blocks at all.) We modify the FIFOmodel to pass over the blocks with 100% valid pages. In the modified FIFO model,we assume that the write length follows exponential distribution with average writelength q such that the probability of write length being more than x is

P(W ≥ x) = 1 − 1q

e− x(N−1)q (7)

Figure 4 shows an example of a large write when a write starts from page number jin the second block. In the figure, two blocks are fully written because of this one writerequest.

As described in the figure, page number in a block starts from B and decreases to 1 aswrites continue. When a write with size W starts at location j of a block, n blocks willbe written fully with valid pages when (n+ 1)B ≥ W − j ≥ nB. The expected number ofblocks filled only with valid pages is the sum of the expectation of n blocks being withina write.

∞∑n=1

nP((n + 1)B+ j ≥ W ≥ nB+ j)

=∞∑

i=1

P(W ≥ iB+ j)

(8)

We assume that a write can start at any page with equal probability 1/B. Theestimated number of blocks full of valid pages is

M =B∑

j=1

1B

∞∑i=1

P(W ≥ iB+ j). (9)

In a FIFO queue, which is the queue of blocks sorted by written time, blocks aremoving at rate qAF ′/B, where AF ′ is the write amplification from garbage collectionwith modified FIFO policy. Then, surviving rate S of a block when it gets to the end ofthe FIFO queue is

S =(

1 − q · aT

) T/BqAF′ /B

. (10)

The number of blocks 100% filled with valid pages when garbage collection picks theblocks is given by

Mv = M · S, (11)

and the probability of a block being filled with all valid pages is mv = Mv/(T/B).



Since Mv blocks are passed by the garbage-collection process, the write amplificationfrom garbage collection is modified as:

AF ′ = 1 − mv

1 − (1 − q·a

T

) T/Bq·AF′ /B

. (12)

In this article, we refer to the modified FIFO as FIFO unless specified.

Greedy. This policy chooses the victim block that has the least number of valid pages.It reduces the number of redundant writes for moving valid pages, while it requiresadditional data structures and operations to keep the information of the number ofvalid pages in every block.

The write amplification from greedy garbage collection is estimated in Equation (13).In the equation, X0 is the expected number of valid pages in the victim block.

X0 = 12

− 2Ba

· W(

−(

1 + 12B

)ae−(1+ 1

2B )a)

AG = BB− (X0 − 1)

(13)

Most workloads have nonuniform access patterns, where a significant fraction of theworkload may concentrate on a small data space. Let fraction r of the workload accessdata in fraction f in space. Then, the write amplification is given by:

AF(a, r, f ) = 1 + r

erf

aA − 1

+ 1 − r

e1−r1− f

aA − 1

AG(a, r, f ) = AF((1 + 1/2B)a, r, f )1 + 1/2B

. (14)

The details of these equations are discussed in [Desnoyers 2012].

4.1.2. ECC Recovery. According to [Moon and Reddy 2012], ECC recovery can be one ofthe sources of write amplification. When ECC detects and corrects bit errors in a page,the resulting write is an unintended write resulting from a read request. In Figure 2,the amount of the write amplification from ECC recovery is the ratio of the amount ofwrite traffic moving the page from states with 1, 2, . . . , K errors to a page with 0 biterrors to the actual write traffic issued to the device by the user. Equation (15) showsthe write amplification.

αrcv =(

μr

K∑i=1

Pi

)/w (15)

The work in [Moon and Reddy 2012] suggested threshold-based ECC (TECC) toreduce the write amplification from frequent recovery by leaving bit errors in a pageuntil it accumulates a certain number of bit errors. TECC can drastically reduce thewrite amplification from ECC.

The Markov model of page error rate in Figure 2 is modified as Figure 5 when TECCis employed. In the model, the write amplification from ECC recovery is

αrcv =(

μr

K∑i=N

Pi

)/w, (16)

where N is threshold and bit errors are left in storage media until the number of biterrors exceeds the threshold.



Fig. 5. A Markov model of page error rate with TECC.

Fig. 6. A large write over RAID5.

4.1.3. Parity Update. As discussed in Section 3, the write amplification from parityupdate depends on workload characteristics and is somewhere in between N/(N − 1)and 2. The number of parity updates depends on an average of the length of writerequests. Figure 6 shows an example of writing 6 pages from the 4th page in the pagegroup 1 in an SSD array with 8 SSDs, which results in 2 parity page updates and awrite amplification of 1.33. When the write starts from 1st page, however, the resultingparity update is 1 and the write amplification will decrease to 1.16.

In general, when workload starts to write q pages from ith page in a page group, thenumber of parity update is �(q + i − 1)/(N − 1)� when the average write length is q.The average number of parity update qp is

qp = 1N − 1

·N−2∑i=0

⌈q + iN − 1

⌉, (17)

and the write amplification from parity update is

αparity = 1 + qp

q(18)

Note that the write amplification from parity protection comes from two methods.The first is the write amplification from parity update, and the second comes fromincreased space utilization because the write amplification from garbage collection αgcincreases as the space utilization increases.

4.1.4. Write Amplification and Lifetime. The write amplifications from ECC recovery andgarbage collection are not independent. When the garbage collection process picks ablock with a page with many bit errors, ECCs can detect and correct the bit errorswhile garbage collection moves the page to empty space for free of additional writes.We count it as write amplification from garbage collection, not from ECC recovery.



Fig. 7. Simulator overview.

Thus, the write amplification from garbage collection is

αgc =K∑

i=0

(g(i, x) · Pi)/w, (19)

where g(i, x) comes from AF , AF ′ , AG, AF(a, r, f ), and AG(a, r, f ) depending on garbagecollection policy and hot/cold characteristics of workload and space.

Device level ECC recovery and system-level parity update are independent. ECCrecovery does not require parity update, and vice versa. But garbage collection dependson the amplified writes from parity updates and ECC recovery:

α = (αrcv + αparity − 1) · αgc. (20)

The write amplification is modeled by factoring the additional writes and their impacton the loss rate σ (x) experienced at the devices.

5. SIMULATION

We built a simulator which takes bit error behavior of SSDs into account. The simulatorestimates the lifetime of SSDs under randomly generated and uniformly distributedworkload. Figure 7 provides an overview of our simulator, in which red boxes showprocesses and blue boxes show objects.

The workload generator produces I/O requests and sends them to the device object.The type of requests and address to read or write are randomly decided following theprobability distribution.

The bit error generator estimates the number of bit errors in the target page that isread or written. The number of bit errors depends on the amount of time passed sincethe last access to the page, erase count of the block where the page resides, and biterror rate data based on measurement studies.

The device object manages a number of block objects and page objects, and givesinformation such as FTL mapping or current status of the block and the pages of aread/write request.

Write requests invalidate old pages and write new data in empty (erased) pages. Readrequests check bit errors and call the ECC unless the number of bit errors are morethan the number of correctable bit errors. Otherwise, the ECC calls parity protection



Table III. Simulation Environment

SSD count Single 2 SSDs 4 SSDs 8 SSDs 16 SSDsWorkload (System) 125MB/s 250MB/s 500MB/s 1000MB/s 2000MB/sWorkload (Device) 125MB/s 125MB/s 125MB/s 125MB/s 125MB/s

Data (Striping, System) 30GB 60GB 120GB 240GB 480GBData (Striping, Device) 30GB 30GB 30GB 30GB 30GB

Space Utilization (Striping, Device) 0.375 0.375 0.375 0.375 0.375Data (RAID5, System) 30GB 120GB 160GB 274GB 512GBData (RAID5, Device) 30GB 60GB 40GB 34GB 32GB

Space Utilization (RAID5, Device) 0.375 0.75 0.5 0.425 0.4

to recover the page error. If the parity protection fails, the simulation stops and issuesa data loss report.

When the space utilization of the SSD reaches an upper threshold, garbage collectionprocess is invoked and the garbage collection process cleans victim blocks until thespace utilization goes below a lower threshold. Garbage collection policy can be differentby the way it picks victim blocks in the written block queue. The garbage collectionprocess updates the erase count of victim blocks when its process completes.

6. EVALUATION

The lifetime of an SSD array is evaluated in different environments. Various aspectsof SSD arrays are explored.

6.1. Simulation Environment

We exploited bit error rate mainly from [Sun et al. 2011], specifically, the bit error rateof 3x nm MLC flash memory. We assume that 61b correctable ECC for a 4KB page isemployed. We assume a constant annual device failure rate of 3% based on the surveyresult [Thomas 2012] over its lifetime. We consider different device failure rates laterin the article. Each SSD has a capacity of 80GB.

We assume 30GB of data and 125MB/s of workload, with 3:1 read:write ratio, perdevice as a default. This allocates the same amount of data and the same amount ofworkload to each device. As a result, the total amount of data and workload in an SSDarray is proportional to the number of SSDs. The actual space utilization can be morethan 30GB due to parity or replication. For example, in 4-SSD RAID5, each SSD has30GB of data and 10GB of parity, which results in space utilization of 0.5 while thespace utilization of 4-SSD striping is 0.375. Table III shows the change of importantsystem environment variables according to the number of SSDs in an array. Vendorsrate devices by maximum amount of data that can be written to a device per day dueto write endurance concerns. For the devices that we consider in our study here, theworkload of 125MB/s translates into 33 device writes per day. Enterprise devices areexpected and rated to endure 10 to 50 device writes per day [Fujitsu 2014; Gathmanet al. 2013; HGST 2016; HP 2016; SanDisk 2013]. We consider different workloadintensities later in the article.

As a cleaning policy of garbage collection, greedy policy is employed unless specified.The TRIM command issued by file system indicates to SSD which data is invalidated.

We assume that the TRIM command is exploited with zero overhead.The lifetime varies by vendor and model of SSD. We normalize the lifetime results to

the lifetime of a single SSD in a default environment. We call the normalized lifetimerelative MTTDL.



Fig. 8. Lifetime of an SSD versus space utilization.

6.2. Review of Analysis of Single SSD

Space utilization and the write amplification from garbage collection are strongly re-lated to drive lifetime. Figure 8 shows the relationship between space utilization andthe lifetime of a single SSD. It shows an interesting point: the expected lifetime doesnot decrease linearly as space utilization increases. When we increase space utilizationfrom 0.1 to 0.5, the amount of data is five times the original data, with only 18% loss oflifetime. If we increase space utilization from 0.5 to 0.9, in contrast, lifetime decreasesby about 70%. Lifetime decreases faster at higher space utilizations.

6.3. Simulation

We built a simulator as described in Section 5 to verify our analytic models. Sincelifetime simulation takes a considerable amount of time to finish, we verify the simplestcase when garbage collection policy is FIFO, and workload is uniformly distributed.We limit the capacity of SSD to 80MB, with a workload of 128KB/s/device, which scalesthe 80GB capacity and 125MB/s/device pair for both analysis and simulation for faircomparison2.

Figure 9 compares the results from analysis to that of simulations. The analyticalmodel tracks simulation results fairly closely except when space utilization is very high.The error between the model and the simulation reaches 12% and 30% at utilizationsof 80% and 90%, respectively.

We extended the simulator to multiple SSDs to verify our model further. The resultsare shown in Figure 10. The simulation results are obtained from a scaled-down versionof the system. The results show that the analysis is close to the simulation results atthe considered system configuration parameters. In the rest of the article, we presentresults from our analytic model.

6.4. Is RAID5 Superior to Striping?

We compare the lifetime of RAID5 to the lifetime of striping in this section. Since thewrite amplification from parity update varies depending on small versus large writes,we describe the lifetime of RAID5 as a range from its minimum to maximum achievable.

2We use a workload of 128KB/s/device instead of 125KB/s/device in a scaled-down version and compared itto the analytic model with a 128KB/s/device workload, since the unit of read and write operation is 4KB.The rest of the analytic models assume 125MB/s/device workload with 30GB of data in 80GB device capacityunless specified.



Fig. 9. Simulation versus analysis result of a single SSD (80MB, 128KB/s).

Fig. 10. Simulation result of SSD arrays (80MB, 128KB/s).

We first analyze the impact of write amplification to the lifetime of SSD-based arrays.We vary parameters to find when RAID5 is superior to striping and when RAID5 maynot be so beneficial in this section.

The number of devices. Figure 11 compares the lifetime of different systems withvarying number of SSDs. It brings out many interesting implications. First, mirroring3

at 2 SSDs suffers extremely from higher write amplification. Its write amplificationfrom parity update (or replica copy) is always 2. Its space utilization is twice that ofstriping on 2 SSDs. The lifetime of RAID5 systems grows with the number of devices.The results show that with less than 8 SSDs, the overhead from additional writes fromparity overwhelms RAID5’s reliability benefits.

The amount of data. Since the space utilization of RAID5 is higher than striping dueto parity, it is more sensitive to space utilization than striping. This trend is shown inFigure 12. As we increase the amount of data, the lifetime of RAID5 drastically de-creases, and eventually its maximal lifetime is less than striping. This implies that thetotal amount of data should be less than a certain amount; otherwise, RAID5 will notbe beneficial in terms of reliability. RAID5 is shown to be competitive when the amountof data is less than 240GB or space utilization is less than about 40%. The 400GB

3RAID5 for 2 devices is the same as mirroring for 2 devices.



Fig. 11. Lifetime of different number of SSDs. The error bar on RAID5 shows the possible range of lifetimedepending on the range of the write amplification from parity update. The lifetime of RAID5 is the lifetimewhen the write amplification is on average.

Fig. 12. Lifetime of 8 SSDs with different amounts of data.

of data is equivalent to 71% space utilization or 40% overprovisioning. Consideringthat recent SSDs typically provide overprovisioning from 7% to 20% (93% to 83% ofspace utilization), additional measures should be taken to (1) increase overprovisioningconsiderably and/or (2) reduce write amplification further.

Garbage collection policy. We exploited the analytic model of garbage collection in[Desnoyers 2012] and extended its FIFO policy model considering nonuniform writesand write length. Figure 13 shows the impact of different cleaning policies and hotnessof workload on data lifetime. With hot workload, hot-data blocks can be more efficientlycleaned than cold-data blocks. However, FIFO policy picks cold blocks at a higherprobability because of the larger number of cold blocks than hot blocks. As a result,garbage collection efficiency decreases with hot workload and FIFO policy, while itincreases with hot workload and greedy policy. The benefit here is not substantialbecause hot workload intensifies read disturb error rate.

In contrast, we can see that the lifetime of SSDs increases with greedy policy andhot workload when the space utilization of SSDs is higher, as seen in Figure 14, whichshows that the lifetime of SSD arrays is determined by garbage-collection efficiency.When the space utilization of SSDs is higher, the write amplification from garbagecollection is the dominant source of degradation. The gain from efficient garbage



Fig. 13. Lifetime of 8 SSDs with different garbage-collection/ECC algorithms.

Fig. 14. Lifetime of 8 SSDs with different garbage-collection policies, when space utilization is higher thanthat in Figure 13. The amount of data is 60GB per SSD, instead of 30GB per SSD in Figure 13.

collection is larger than the loss from higher read disturb error rate, and the lifetimeof SSDs increases with greedy policy and hot workload.

Advanced techniques. Note that the lifetime gain is not linearly increasing as writeamplification decreases. This is because write amplifications from various sources arenot independent. When the number of devices increases, for example, write amplifica-tions from both parity update and garbage collection reduce at the same time. Moreover,when parity-update overhead decreases, garbage-collection rate also decreases, whichresults in less write amplification from garbage collection.

Many techniques have been recently proposed to reduce write amplification. In Fig-ure 15, we evaluated TECC, which reduces the write amplification from ECC recovery.As expected, lifetime increases considerably. With TECC, write amplification from par-ity update has more impact on lifetime. As explained in Section 4.1.4, reducing writeamplification from ECC recovery makes the influence of write amplification from parityupdate relatively larger.

Another technique that we evaluated is scrubbing, which actively scans cold dataduring idle time to prevent the accumulation of errors. Figure 16 shows the lifetime of8 SSDs with and without scrubbing of cold data. We assume that 80% of hot workloadis concentrated on 20% of hot space. The result shows that scrubbing is not benefi-cial in improving the lifetime of the device. With both ECC and TECC, scrubbing is



Fig. 15. Lifetime of SSD arrays with TECC.

Fig. 16. Lifetime of 8 SSDs with/without scrubbing/TECC.

shown to decrease lifetime slightly. Hot pages enable efficient garbage collection andreduce write amplification. However, hot pages experience increased read disturb er-rors, canceling out the benefits of more efficient garbage collection. This implies thatcareful consideration is required to employ advanced techniques in SSD-based storagesystems.

Once again, the flash characteristics—here, the read disturb errors—are sufficientlydifferent from magnetic disk drives, hence warrant careful reexamination of traditionaldata-protection mechanisms.

Advanced techniques to reduce write amplification favor RAID5. RAID5 has higherspace utilization due to parity, resulting in higher write amplification from garbagecollection. RAID5 has write amplification from parity updates as well, which amplifyeach other. When we compare the lifetime of striping and RAID5 in Figures 11 and 15,we can see that, with TECC, the lifetime of RAID5 increases more than that of striping.

Read:Write ratio. Read:Write ratio in workload has an impact on many factors inreliability, such as error detection rate and the write amplification from ECC recovery.We evaluate the lifetime of an SSD array with different read:write ratios in Figure 17.Under the same intensity of workload, a higher read:write ratio implies a lesser amountof write workload. Thus, the lifetime of both RAID5 and striping increase as read:writeratio increases. However, the lifetime of RAID5 improves faster than striping.



Fig. 17. Lifetime of 8 SSDs with different read:write ratios in workload.

Fig. 18. Lifetime improvement of 8 SSDs with different workload intensities.

Workload Intensity. We explore different workload intensities and their impact onthe lifetime of RAID5 and striping. Figure 18 shows the lifetime of RAID5 and stripingwhen the workload intensity is lower.

The lifetime of RAID5 goes up rapidly as the workload intensity decreases. The less-intensive workload wears out SSD at a slower rate, resulting in lower page errors.As a result, device failure rate, which is constant, is more likely to cause data lossthan bit/page errors under the less-intensive workload. Since parity protection servesrecovery from a device failure, RAID5 is more robust than striping under less-intensiveworkloads.

We compare the lifetime of RAID5 to striping under less-intensive workload,31.25MB/s/dev, in Figure 19. We can clearly see that RAID5 wins over striping witha smaller number of SSDs than RAID5 does under 125MB/s/dev, which is shown inFigure 11.

A (nonvolatile) cache can absorb a significant part of the workload going to the stor-age system. The result in Figure 18 shows that, with reduced workloads, RAID5 canoutperform striping. Moreover, RAID5 obtains a higher benefit from caching than strip-ing. Note that, in a system with a cache, the cache may alter the workload distributionat the drives.



Fig. 19. Lifetime of SSD arrays with 31.25MB/s/dev workload.

Fig. 20. Lifetime of SSD arrays at higher annual device failure rate (5%).

Workload characteristics. We consider other parameters, such as read:write ratiosand data access rates, in Figures 17 and 18. The workloads that are less write intensiveresult in SSDs that wear out at a slower rate while constant device failure ratescontribute more to data loss. Since RAID5 is robust to device-level failure, its lifetimegrows faster than striping. The 62.5MB/s workload has more writes than the R:W=9:1workload, but it has a lower read request rate, which in part converts to redundantwrites from frequent ECC recovery.

Device failure rate. We evaluated the lifetime of RAID5 with higher failure rate(annually 5%) in Figure 20. In the figure, when the number of devices is more than 8,RAID5 is better in lifetime, on average, than striping. This result shows that RAID5 isuseful with larger device failure rate and a larger number of devices in the array. Thelarger number of devices contribute to larger failure rates at the system level as wellas to smaller overhead of parity in write workload and capacity.

Figure 21 shows the lifetime of 8 SSDs with different annual device failure rates(AFR). It clearly shows that RAID5 improves reliability at higher device failure rates.We assume that AFR is 3% for the rest of the evaluation, which comes from the averagereturning rate of SSDs in [Thomas 2012].

Nonuniform Workload. We consider a more realistic workload, which follows ex-ponential distribution with average write length q. Figure 22 shows the shape of



Fig. 21. Lifetime of 8 SSDs with different annual device failure rates.

Fig. 22. Write amplification from garbage collection (policy: modified FIFO).

Equation (12): the write amplification from garbage collection with modified FIFOpolicy. It shows that the write amplification from garbage collection rapidly drops asthe average write length increases. As shown in the figure, the write amplification fromgarbage collection drastically deceases at higher space utilization as the average writelength increases thanks to sequentially written pages. This result is consistent withthe well-known observation that performance under large sequential writes is betterthan that under small and random writes because of less frequent garbage collection.

We evaluated the lifetime of striping and RAID5 with different average write lengthsfrom 4KB to 20MB in Figure 23. At low device space utilizations, write lengths do nothave a significant impact on write amplification. However, write lengths can impactwrite amplification significantly at higher space utilizations. Figure 23(a) shows theresults with the default parameters used so far in this article; Figure 23(b) showsthe results from a configuration with higher space utilization. In both cases, lifetimeimproves as the write length increases from 4KB to 20MB. The parity update over-head decreases with larger writes and the average data lifetimes get close to maximallifetimes with increased write lengths. When we compare Figures 23(a) and 23(b), it



Fig. 23. Lifetime of 8 SSDs with different average write lengths.

is observed that larger write lengths have higher impact on lifetime at higher spaceutilizations, consistent with the write amplification results in Figure 22.

Another observation is that parity update becomes efficient when the write lengthincreases. Figure 24 shows the change of lifetime as the average write length increases.It shows that lifetime improves considerably at smaller write lengths as we increasethe write length from 4KB to 40KB, and again at higher write lengths when the writelength increases from 800KB. At smaller write lengths, the improvement comes fromincreased efficiency in parity updates; at higher write lengths, the efficiency comesfrom improved garbage collection efficiency. It is also observed that write lengths playa larger role at higher-capacity utilizations.

Erase Block Size. The evaluation results in Figure 23 show that large write lengthsmake garbage collection more efficient by increasing the number of blocks with mostlyinvalid pages. Instead of increasing the write lengths in the workload, we can decreaseerase block size to increase the number of blocks with many invalid pages. Figure 25is the evaluation result when we employ smaller erase blocks. It shows that the life-time of SSD arrays increases as erase block size decreases; similar to the results ofdifferent average write lengths in Figure 23, RAID5 lifetime improves faster than strip-ing. This implies that fine-grained garbage collection would be useful to reduce write



Fig. 24. Accumulative increment in the lifetime of 8 SSDs as average write length increases.

Fig. 25. Lifetime of 8 SSDs with different erase block sizes for garbage collection; average write lengthq = 512KB.



Fig. 26. Lifetime improvement of SSD arrays (using one more SSD for RAID5).

Fig. 27. A series of 4-SSD RAID5s versus striping.

amplification. On the other hand, careful consideration is required regarding the im-pact of reduced erase block size on the organization of the internals of the device and,as a result, the device capacity.

Spare SSD. Since space overhead of parity is one of the main sources of degradationin the lifetime of RAID5, we tested how much lifetime we can get if we invest onemore SSD to keep space utilization of RAID5 the same as striping. Figure 26 showsthe lifetime comparison of N SSD striping and N+1 SSD RAID5. The results show thatthe lifetime of RAID5 can be lower than striping for a small number of devices, evenwhen we opt to employ an extra device. However, the lifetimes have become closer androughly comparable above 4 devices. These results show that simply adding an extradevice to reduce space utilization may not be sufficient for RAID5 to have a higherlifetime than striping for a smaller number of devices.

Scalability. Our evaluation results so far have shown that RAID5 may not always bebeneficial in improving lifetime compared to striping when we consider a single arrayof 4 to 16 SSDs. However, when we scale out those results to many devices, we see atotally different story. In Figure 27, we compare the lifetime of a system consisting ofmany 4-SSD arrays to the lifetime of striping. The results show that striping is very



poor at scaling out since only one failure in tens of devices results in data loss. Wecannot argue to compose large-scale storage systems without parity protection at thesystem level, but our results show that RAID5 is not universally beneficial, especiallywith a small number of devices.

Summary. Most of our evaluation results imply that write amplification is key tounderstanding the reliability of SSD arrays:

(1) Write amplification can be rather harmful to the reliability of SSD arrays. Thesources of write amplification should be carefully examined before employing RAID-like protection schemes.

(2) Many factors can change write amplification, such as space utilization, hotness, andrandomness of workload. More efficient system-level protection techniques need tobe designed taking write amplification into account.

7. RELATED WORK

Several studies [Boboila and Desnoyers 2010; Desnoyers 2012; Hu et al. 2009;Jagmohan et al. 2010] analyzed the impact of write amplification on the performanceof flash memory. Our recent work [Moon and Reddy 2012] built a statistical model toreveal the relationship between lifetime and write amplification. However, that workconsidered only a single device, assuming that write amplification of different sourcesare independent. We take the dependencies of sources of write amplifications into ac-count. As an extension of our previous work [Moon and Reddy 2013], we also introduceanother Markov model to consider multidevice storage systems (RAID).

Many studies have investigated SSD-based RAID arrays. A notable study [Jeremicet al. 2011] considered SSD-based RAID in terms of performance. Unlike that study,we focus on reliability. The work in Greenan et al. [2009] considered cache for parity.Another work [Mao et al. 2010] employed HDDs as a parity storage to relieve the over-head of parity. These works require additional hardware; we consider RAID systemswith SSDs only. As discussed in this article, cache favors RAID5 more than striping.RAID cache can be a candidate solution to resolve the problem of parity overhead inSSD-based RAID5.

Harmonia [Kim et al. 2011] coordinated garbage-collection processes of SSDs in idletime to prevent the garbage-collection process from slowing down overall performance.The focus of the earlier study is on performance, while we focus on reliability.

The problem of simultaneous wearing out of SSDs in SSD-based RAID was consid-ered in the study of differential RAID [Balakrishnan et al. 2010]. The diff-RAID triesto wear out SSDs unevenly to minimize the chance of simultaneous failures, based onthe rated write cycles without considering the potential variability in those ratings.

A recent presentation [de la Iglesia 2012] introduced a modified RAID system forSSDs in which a global controller manages the number of parity blocks to make de-vices wear evenly. While studies suggested specialized RAID systems to cover SSDs’drawbacks, we focus on finding the relation between page error/device failure andexpected lifetime when RAID5 is employed.

Other work [Oh et al. 2012; Saxena et al. 2012, 2013; Zhang et al. 2012] studieddifferent architecture, which uses SSD as a cache for HDDs.

Reliability of storage systems has received much attention. A number of studies havelooked at lifetime analysis of storage systems with single- or double-failure protection.These studies range from analysis [Patterson et al. 1988; Xin et al. 2003], to the designof powerful ECCs [Chen et al. 2008; Gregori et al. 2003; Mielke et al. 2008; Plank 2005;Stead 2012; Sun et al. 2006], to system-level mechanisms [Ousterhout and Douglis1989; Rosenblum and Ousterhout 1992]. Much work has been done to optimize the



data organization and FTL layer within SSDs to reduce garbage-collection overheadsto improve wear leveling and other operations [Baek et al. 2007; Birrell et al. 2007;Chang 2007; Chen et al. 2011; Gupta et al. 2011; Kim and Ahn 2008; Kim and Lee2002; Lee et al. 2007].

Commercial vendors are offering eMLC devices [SMART Storage Systems 2013; SpecInc. 2011], improving MLC lifetime characteristics. They employ a number of protectiontechniques, such as wear leveling and parity protection, which are mostly employed orassumed in our model.

8. CONCLUSION

This article analyzed the relation between parity protection and the lifetime of SSD ar-rays. Parity update increases write workload and space utilization, which can severelydegrade the reliability of SSD arrays.

According to our analytical model and evaluation, RAID5 is conditionally better inlifetime than striping due to the overhead of parity. Different factors are explored, suchas the number of devices and the amount of data; the results imply that RAID5 is notuniversally beneficial in improving the reliability of SSD-based systems.

In conclusion, our results show that the lifetime of RAID5 can be worse than that ofstriping in some cases.

REFERENCES

Seungjae Baek, Seongjun Ahn, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2007. Uniformity improvingpage allocation for flash memory file systems. In Proceedings of the 7th ACM and IEEE InternationalConference on Embedded Software (EMSOFT’07). ACM, New York, NY, 154–163. DOI:http://dx.doi.org/10.1145/1289927.1289954

Mahesh Balakrishnan, Asim Kadav, Vijayan Prabhakaran, and Dahlia Malkhi. 2010. Differential RAID:Rethinking RAID for SSD reliability. ACM Transactions on Storage 6, 2, Article 4, 22 pages. DOI:http://dx.doi.org/10.1145/1807060.1807061

Andrew Birrell, Michael Isard, Chuck Thacker, and Ted Wobber. 2007. A design for high-performance flashdisks. SIGOPS Operation Systems Review 41, 2, 88–93. DOI:http://dx.doi.org/10.1145/1243418.1243429

Simona Boboila and Peter Desnoyers. 2010. Write endurance in flash drives: Measurements and analysis.In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). USENIXAssociation, Berkeley, CA, 9–9. http://dl.acm.org/citation.cfm?id=1855511.1855520

Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai. 2012. Error patterns in MLC NAND flash mem-ory: Measurement, characterization, and analysis. In Proceedings of the Conference on Design, Au-tomation and Test in Europe (DATE’12). EDA Consortium, San Jose, CA, 521–526. http://dl.acm.org/citation.cfm?id=2492708.2492838

Li-Pin Chang. 2007. On efficient wear leveling for large-scale flash-memory storage systems. In Proceed-ings of the 2007 ACM Symposium on Applied Computing (SAC’07). ACM, New York, NY, 1126–1130.DOI:http://dx.doi.org/10.1145/1244002.1244248

Bainan Chen, Xinmiao Zhang, and Zhongfeng Wang. 2008. Error correction for multi-level NAND flashmemory using Reed-Solomon codes. In IEEE Workshop on Signal Processing Systems, 2008 (SiPS’08).94–99. DOI:http://dx.doi.org/10.1109/SIPS.2008.4671744

Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer en-hancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIXConference on File and Storage Technologies (FAST’11). USENIX Association, Berkeley, CA, 6–6.http://dl.acm.org/citation.cfm?id=1960475.1960481

E. de la Iglesia. 2012. Abstracting the flash translation layer for enterprise-ready MLC. Presentation atFlash Memory Summit, Santa Clara, CA.

Peter Desnoyers. 2012. Analytic modeling of SSD write performance. In Proceedings of the 5th AnnualInternational Systems and Storage Conference (SYSTOR’12). ACM, New York, NY, Article 12, 10 pages.DOI:http://dx.doi.org/10.1145/2367589.2367603

Fujitsu. 2014. Fujitsu PRIMERGY servers solution-specific evaluation of SSD write endurance [white pa-per]. Retrieved April 1, 2016 from http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-solution-specific-evaluation-of-ssd-write-endurance-ww-en.pdf.


http://dx.doi.org/10.1145/1289927.1289954

http://dx.doi.org/10.1145/1289927.1289954

http://dx.doi.org/10.1145/1807060.1807061

http://dx.doi.org/10.1145/1807060.1807061

http://dx.doi.org/10.1145/1243418.1243429

http://dl.acm.org/citation.cfm?id$=$1855511.1855520

http://dl.acm.org/citation.cfm?id=2492708.2492838


http://dx.doi.org/10.1145/1244002.1244248

http://dx.doi.org/10.1109/SIPS.2008.4671744


http://dx.doi.org/10.1145/2367589.2367603

http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-solution-specific-evaluation-of-ssd-write-endurance-ww-en.pdf

http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-solution-specific-evaluation-of-ssd-write-endurance-ww-en.pdf


Eran Gal and Sivan Toledo. 2005. Algorithms and data structures for flash memories. ACM ComputingSurveys 37, 2, 138–163. DOI:http://dx.doi.org/10.1145/1089733.1089735

J. Gathman, A. McPadden, and G. Tressler. 2013. The need for standardization in the enterprise SSD productsegment. Presentation in Flash Memory Summit, Santa Clara, CA.

Kevin Greenan, Darrell D. E. Long, Ethan L. Miller, Thomas Schwarz, and Avani Wildani. 2009. Buildingflexible, fault-tolerant flash-based storage systems. In Proceedings of the 5th Workshop on Hot Topics inSystem Dependability (HotDep’09).

S. Gregori, A. Cabrini, O. Khouri, and G. Torelli. 2003. On-chip error correcting techniques for new-generation flash memories. Proceedings of the IEEE 91, 4, 602–616. DOI:http://dx.doi.org/10.1109/JPROC.2003.811709

E. Grochowski. 2012. Future technology challenges for NAND flash and HDD products. Presentation atFlash Memory Summit, Santa Clara, CA.

L. M. Grupp, A. M. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. H. Siegel, and J. K. Wolf. 2009. Char-acterizing flash memory: Anomalies, observations, and applications. In 42nd Annual IEEE/ACM Inter-national Symposium on Microarchitecture (MICRO’09). 24–33.

Laura M. Grupp, John D. Davis, and Steven Swanson. 2012. The bleak future of NAND flash memory.In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIXAssociation, Berkeley, CA, 2–2. http://dl.acm.org/citation.cfm?id=2208461.2208463

Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leverag-ing value locality in optimizing NAND flash-based SSDs. In Proceedings of the 9th USENIXConference on File and Storage Technologies (FAST’11). USENIX Association, Berkeley, CA, 7–7.http://dl.acm.org/citation.cfm?id=1960475.1960482

HGST. 2016. UltrastarTM SSD800MH [Datasheet]. https://www.hgst.com/sites/default/files/resources/US_SSD800MH_ds.pdf.

HP. 2016. HP enterprise solid state drive [Datasheet]. Retrieved April 1, 2016 from http://h20195.www2.hp.com/v2/GetPDF.aspx%2F4AA4-7186ENW.pdf.

Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, and Roman Pletka. 2009. Write amplificationanalysis in flash-based solid state drives. In Proceedings of SYSTOR 2009: The Israeli Experimental Sys-tems Conference (SYSTOR’09). ACM, New York, NY, Article 10, 9 pages. DOI:http://dx.doi.org/10.1145/1534530.1534544

A. Jagmohan, M. Franceschini, and L. Lastras. 2010. Write amplification reduction in NAND flash throughmulti-write coding. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10).1–6. DOI:http://dx.doi.org/10.1109/MSST.2010.5496985

Nikolaus Jeremic, Gero Muhl, Anselm Busse, and Jan Richling. 2011. The pitfalls of deploying solid-statedrive RAIDs. In Proceedings of the 4th Annual International Conference on Systems and Storage (SYS-TOR’11). ACM, New York, NY, Article 14, 13 pages. DOI:http://dx.doi.org/10.1145/1987816.1987835

Han-Joon Kim and Sang-Goo Lee. 2002. An effective flash memory manager for reliable flash memory spacemanagement. IEICE Transactions on Information and Systems 85, 6, 950–964.

Hyojun Kim and Seongjun Ahn. 2008. BPLRU: A buffer management scheme for improving randomwrites in flash storage. In Proceedings of the 6th USENIX Conference on File and Storage Technolo-gies (FAST’08). USENIX Association, Berkeley, CA, Article 16, 14 pages. http://dl.acm.org/citation.cfm?id=1364813.1364829

Youngjae Kim, S. Oral, G. M. Shipman, Junghee Lee, D. A. Dillow, and Feiyi Wang. 2011. Harmonia: Aglobally coordinated garbage collector for arrays of solid-state drives. In IEEE 27th Symposium on MassStorage Systems and Technologies (MSST’11). 1–12. DOI:http://dx.doi.org/10.1109/MSST.2011.5937224

Jongmin Lee, Sunghoon Kim, Hunki Kwon, Choulseung Hyun, Seongjun Ahn, Jongmoo Choi, Donghee Lee,and Sam H. Noh. 2007. Block recycling schemes and their cost-based optimization in NAND flash mem-ory based storage system. In Proceedings of the 7th ACM/IEEE International Conference on EmbeddedSoftware (EMSOFT’07). ACM, New York, NY, 174–182. DOI:http://dx.doi.org/10.1145/1289927.1289956

Bo Mao, Hong Jiang, Dan Feng, Suzhen Wu, Jianxi Chen, Lingfang Zeng, and Lei Tian. 2010. HPDA: A hybridparity-based disk array for enhanced performance and reliability. In IEEE International Symposium onParallel Distributed Processing (IPDPS’10). 1–12. DOI:http://dx.doi.org/10.1109/IPDPS.2010.5470361

N. Mielke, T. Marquart, Ning Wu, J. Kessenich, H. Belgal, Eric Schares, F. Trivedi, E. Goodness, and L. R.Nevill. 2008. Bit error rate in NAND flash memories. In Reliability Physics Symposium, 2008 (IRPS’08).IEEE International. 9–19. DOI:http://dx.doi.org/10.1109/RELPHY.2008.4558857

Sangwhan Moon and A. L. N. Reddy. 2012. Write amplification due to ECC on flash memory or leave thosebit errors alone. In IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). 1–6.DOI:http://dx.doi.org/10.1109/MSST.2012.6232375


http://dx.doi.org/10.1145/1089733.1089735

http://dx.doi.org/10.1109/JPROC.2003.811709

http://dx.doi.org/10.1109/JPROC.2003.811709



https://www.hgst.com/sites/default/files/resources/USSSD800MHds.pdf

https://www.hgst.com/sites/default/files/resources/USSSD800MHds.pdf

http://h20195.www2.hp.com/v2/GetPDF.aspxpercnt;2F4AA4-7186ENW.pdf

http://h20195.www2.hp.com/v2/GetPDF.aspxpercnt;2F4AA4-7186ENW.pdf

http://dx.doi.org/10.1145/1534530.1534544

http://dx.doi.org/10.1145/1534530.1534544

http://dx.doi.org/10.1109/MSST.2010.5496985

http://dx.doi.org/10.1145/1987816.1987835




http://dx.doi.org/10.1145/1289927.1289956

http://dx.doi.org/10.1109/IPDPS.2010.5470361

http://dx.doi.org/10.1109/RELPHY.2008.4558857



Sangwhan Moon and A. L. Narasimha Reddy. 2013. Don’t let RAID raid the lifetime of your SSD array. InProceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’13).USENIX Association, Berkeley, CA, 7–7. http://dl.acm.org/citation.cfm?id=2534861.2534868

Yongseok Oh, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2012. Caching less for better performance:Balancing cache size and update cost of flash memory cache in hybrid storage systems. In Proceedingsof the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association,Berkeley, CA, 25–25. http://dl.acm.org/citation.cfm?id=2208461.2208486

John Ousterhout and Fred Douglis. 1989. Beating the I/O bottleneck: A case for log-structured file systems.SIGOPS Operating Systems Review 23, 1, 11–28. DOI:http://dx.doi.org/10.1145/65762.65765

David A. Patterson, Garth Gibson, and Randy H. Katz. 1988. A case for redundant arrays of inexpensivedisks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management ofData (SIGMOD’88). ACM, New York, NY, 109–116. DOI:http://dx.doi.org/10.1145/50202.50214

J. S. Plank. 2005. Erasure Codes for Storage Applications. Tutorial Slides, presented at FAST-2005: 4thUsenix Conference on File and Storage Technologies. Retrieved April 1, 2016 from http://www.cs.utk.edu/∼plank/plank/papers/FAST-2005.html. (2005).

Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file sys-tem. ACM Transactions on Computer Systems 10, 1, 26–52. DOI:http://dx.doi.org/10.1145/146941.146943

A. M. Saleh, J. J. Serrano, and J. H. Patel. 1990. Reliability of scrubbing recovery-techniques for memorysystems. IEEE Transactions on Reliability 39, 1, 114–122. DOI:http://dx.doi.org/10.1109/24.52622

Mohit Saxena, Michael M. Swift, and Yiying Zhang. 2012. FlashTier: A lightweight, consistent and durablestorage cache. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12).ACM, New York, NY, 267–280. DOI:http://dx.doi.org/10.1145/2168836.2168863

Mohit Saxena, Yiying Zhang, Michael M. Swift, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau.2013. Getting real: Lessons in transitioning research simulations into hardware systems. Presented asPart of the 11th USENIX Conference on File and Storage Technologies (FAST’13). USENIX, San Jose,CA, 215–228. https://www.usenix.org/conference/fast13/technical-sessions/presentation/saxena.

Thomas J. E. Schwarz, Qin Xin, Ethan L. Miller, Darrell D. E. Long, Andy Hospodor, and Spencer Ng.2004. Disk scrubbing in large archival storage systems. In Proceedings of the IEEE Computer So-ciety’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computerand Telecommunications Systems (MASCOTS’04). IEEE Computer Society, Washington, DC, 409–418.http://dl.acm.org/citation.cfm?id=1032659.1034226

SanDisk. 2013. WP004-The why and how of SSD over provisioning [White paper]. https://www.sandisk.com/content/dam/sandisk-main/en_us/assets/resources/enterprise/white-papers/WP004_OverProvisioning_WhyHow_FINAL.pdf.

SMART Storage Systems. 2013. GuardianTMtechnology platform [white paper]. Number WP003. RetrievedApril 1, 2016 from http://www.smartstoragesys.com/pdfs/WP003_Guardian_Technology.pdf.

Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber. 2010. Extend-ing SSD lifetimes with disk-based write caches. In Proceedings of the 8th USENIX Conference onFile and Storage Technologies (FAST’10). USENIX Association, Berkeley, CA, 8–8. http://dl.acm.org/citation.cfm?id=1855511.1855519

Spec Inc. 2011. Engineering MLC flash-based SSDs to reduce total cost of ownership in enterprise SSDdeployments. Presentation at Flash Memory Summit, Santa Clara, CA.

S. Stead. 2012. It takes guts to be great. Presentation at Flash Memory Summit, Santa Clara, CA.Fei Sun, Ken Rose, and Tong Zhang. 2006. On the use of strong BCH codes for improving multilevel NAND

flash memory storage capacity. In IEEE Workshop on Signal Processing Systems (SiPS): Design andImplementation.

Hairong Sun, Pete Grayson, and Bob Wood. 2011. Quantifying reliability of solid-state storage from multipleaspects. In Proceedings of the 7th IEEE International Workshop on Storage and Network Architectureand Parallel I/O (SNAPI’11).

K. Thomas. 2012. Solid state drives no better than others, survey says. (2012). Retrieved April 1, 2016 fromhttp://www.pcworld.com/article/213442.

Qin Xin, E. L. Miller, T. Schwarz, D. D. E. Long, S. A. Brandt, and W. Litwin. 2003. Reliability mecha-nisms for very large storage systems. In Proceedings of the 20th IEEE/11th NASA Goddard Conferenceon Mass Storage Systems and Technologies 2003 (MSST’03). 146–156. DOI:http://dx.doi.org/10.1109/MASS.2003.1194851

E. Yaakobi, L. Grupp, P. H. Siegel, S. Swanson, and J. K. Wolf. 2012. Characterization and error-correctingcodes for TLC flash memories. In International Conference on Computing, Networking and Communica-tions (ICNC’12). 486–491. DOI:http://dx.doi.org/10.1109/ICCNC.2012.6167470




http://dx.doi.org/10.1145/65762.65765

http://dx.doi.org/10.1145/50202.50214

http://www.cs.utk.edu/sim;plank/plank/papers/FAST-2005.html

http://www.cs.utk.edu/sim;plank/plank/papers/FAST-2005.html

http://dx.doi.org/10.1145/146941.146943

http://dx.doi.org/10.1109/24.52622

http://dx.doi.org/10.1145/2168836.2168863

https://www.usenix.org/conference/fast13/technical-sessions/presentation/saxena


https://www.sandisk.com/content/dam/sandisk-main/enus/assets/resources/enterprise/white-papers/WP004OverProvisioningWhyHowFINAL.pdf



http://www.smartstoragesys.com/pdfs/WP003_Guardian_Technology.pdf



http://www.pcworld.com/article/213442

http://dx.doi.org/10.1109/MASS.2003.1194851

http://dx.doi.org/10.1109/MASS.2003.1194851

http://dx.doi.org/10.1109/ICCNC.2012.6167470


E. Yaakobi, Jing Ma, L. Grupp, P. H. Siegel, S. Swanson, and J. K. Wolf. 2010. Error characterizationand coding schemes for flash memories. In IEEE GLOBECOM Workshops (GC Wkshps’10). 1856–1860.DOI:http://dx.doi.org/10.1109/GLOCOMW.2010.5700263

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012.De-indirection for flash-based SSDs with nameless writes. In Proceedings of the 10th USENIXConference on File and Storage Technologies (FAST’12). USENIX Association, Berkeley, CA, 1–1.http://dl.acm.org/citation.cfm?id=2208461.2208462

Received June 2014; revised January 2015; accepted April 2015


http://dx.doi.org/10.1109/GLOCOMW.2010.5700263


does raid improve lifetime of ssd arrays?cesg.tamu.edu/wp-content/uploads/2012/02/a11-moon.pdfdata...

Documents