uc santa cruz reliability of mems-based storage enclosures bo hong, thomas j. e. schwarz, s. j. *...
TRANSCRIPT
UC Santa Cruz
Reliability of MEMS-Based Storage Enclosures
Bo Hong, Thomas J. E. Schwarz, S. J.*
Scott A. Brandt, Darrell D. E. Long
Storage Systems Research Center
University of California, Santa Cruz
*Also Santa Clara University, Santa Clara, CA
2
MEMS Storage Technology Micro-Electro-Mechanical Systems (MEMS) storage
• A promising alternative secondary storage technology• Hardware Research: IBM, HP, CMU, Nanochip• Magnetic storage, but very different mechanics
Spring
3
MEMS Storage Technology MEMS-based storage vs. Magnetic Disk
• Provides non-volatile storage, too.• Delivers 10 * faster access time (< 1 ms)• Delivers higher bandwidth (100 MB – 1 GB/s)• Small (size of penny, cent)• Consumes 100* less power• Costs ~10 USD per device• Expected to be more reliable
• Stores limited amount of data per device (3-10 GB)
A serious alternative to disk drives, in particular for mobile computing applications
4
Reliability Implication of MEMS-based Storage Storage systems built from MEMS-based storage …
• Require more MEMS devices At least 10 times the number of disks to meet capacity requirements
• Require more connection components Reliability implication
• More components, hence (?) lower reliability
5
MEMS Storage Enclosure
Our proposal: MEMS Enclosures• A device with dozens of MEMS• Single interface to rest of system• Might be serviceable, but service calls
during economic lifetime should be very rare
Interface
6
MEMS Storage Enclosures Reliability an issue:
• MTTF 1- 2 years without redundant data storage Uses RAID Level 5 technology with distributed sparing
• Additional k spares Calls for service when necessary
• i.e. when we run out of spares Organization and number of spares can
• Decrease the data recovery time and thus improve reliability• Reduce human interference
No errors servicing Reduce maintenance costs
7
MEMS Enclosure Reliability
Measure MTBF for enclosures • Without replacing spares• With replacing spares (service calls)
Determine number of failures that trigger a service call
Mandatory replacement: no redundancy left Preventive replacement: no spare left
8
MEMS Enclosure Reliability without Replacement
Disk23 Yrs
3 spares5.8 Yrs
2 spares4.6 Yrs
No spare2.3 Yrs
1 spare3.5 Yrs
4 spares6.9 Yrs
Disk11.5 Yrs
5 spares8.1 Yrs
MTTFDISK = 11.5 or 23 yrs
MTTFMEMS = 23 yrs 19 data + 1 parity + k
dedicated spares 15-minute data
recovery
MTTF is not enough to measure reliability of enclosures without repairs
Instead: focus on data reliability during the economic lifetimes (3-5 years) of enclosures
9
MEMS Enclosures with Replacement Markov model for a MEMS enclosure with N data,
one parity, and one dedicated spare devices• N – Normal; D – Degraded; DL – Data Loss• 1/ – MTTFMEMS (in tens of years)• 1/µ – Mean Time Between Recovery (in minutes)• 1/ – Mean Time Between Replacement (in days, weeks)
Preventive and mandatory replacement
Preventive replacement
Mandatory replacement
10
MEMS Enclosure Reliability with Replacement
Preventive replacement increases reliability and reduces replacement urgency
No spare
Preventive + mandatory
Mandatory21
3
3
1 2
1, 2, 3 – Number of spares
11
MEMS Enclosure Reliability
Dedicated Sparing• Replace all data from a failed MEMS
on a single spare MEMS Distributed Sparing
• Every spare contains Client data Parity data Spare space
12
Distributed Sparing [Menon and Mattson 1992]
Before failure
X
Shorter data recovery time More devices can fail
After MEMS 4 fails
13
Reliability Comparison: Dedicated Sparing vs. Distributed Sparing
No spare
Preventive + mandatory
MandatoryDedicated
Dedicated
1
2
2
1
1, 2– Number of spares
Compare with following slide
14
Reliability Comparison: Dedicated Sparing vs. Distributed Sparing
Distributed sparing only better at short replacement times when using preventive replacement
No spare
Dedicated &Distributed
Dedicated
Distributed
1
2
2
1
1, 2– Number of spares
Preventive + mandatory
Mandatory
15
All about economy• How long can MEMS enclosures work without repairs?• How often do they need repairing in the first 3-5 years?• How does replacement policies affect maintenance
frequency?
# of failures an enclosure with k spares can tolerate before the (m+1)th repair is scheduled (m >= 0):• (m + 1) × k, under the preventive replacement policy• (m + 1) × (k + 1), under the mandatory replacement
policy
Durability of MEMS Storage Enclosures
16
Durability of MEMS Storage Enclosures
Probabilities that a MEMS storage enclosure has up to k failure during (0, t]
2 failures
4 failures
6 failures
1 failure
Disk23 Yrs
No failure
8 failures
10 failures
First year survivability: 95.7% of disk vs. 98.8% of MEMS enclosures with two spares
Chance that MEMS enclosure with four spares requires more than one service in five years: 3.5% (preventive) vs. 0.6% (mandatory)
17
Related Work MEMS-based storage technology development
• IBM, HP, CMU CHI2PS, Nanochip Digital Micromirror Devices by TI
• Reported Mean Time Between Failure: 650,000 hours [Douglass]
RAID reliability• Dedicated sparing [Dunphy et al.]• Distributed sparing [Menon and Mattson]• Parity sparing [Reddy and Banerjee]
Disk failure prediction• S.M.A.R.T (Self-Monitoring Analysis and Reporting
Technology)
18
Summary Reliability of MEMS storage enclosures
• Can be more reliable than disks even without failed device replacement
• Highly reliable when using preventive replacement • Dedicated sparing and distributed sparing provide
comparable or almost identical reliability Economy of MEMS storage enclosures
• Preventive replacement trades more maintenance services for higher reliability
19
Thank You! Acknowledgements
• Dave Nagle, Greg Ganger, CMU PDL• The rest of the UCSC SSRC
More information:• http://ssrc.cse.ucsc.edu• http://ssrc.cse.ucsc.edu/mems.shtml
Questions?
20
Backup Slides
21
MEMS Storage Technology Micro-Electro-Mechanical Systems (MEMS) storage
• A promising alternative secondary storage technology• Hardware Research: IBM, HP, CMU, Nanochip
Radical differences between MEMS storage and magnetic disk technologies
Disk MEMSRecoding
mediaMagnetic
Magnetic or physical(non-volatile)
Recoding technique
LongitudinalOrthogonal
(higher density)
R/W head SingleThousands – tip array
(Higher bandwidth and parallelism)
Media movement
RotationMedia sled moves in X and Y independently
(no rotation delay)
22
MEMS Storage Device Characteristics Physical size: 1 – 2 cm2
Recording density: 250 – 750 Gb/in2
7GB/s
1ns 10ns 100ns 1us 10us 100us 1ms 10ms
1GB/s
2GB/s
3GB/s
4GB/s
5GB/s
6GB/s
Th
rou
ghp
ut
DRAM
DISK
MEMS
Predicted Performance in 2005
Access Latency
0.5–2 GB$100-$200/
GB
3–10 GB$5-$50/
GB100–500
GB$1-$2/GB
23
MEMS Storage Device
Spring
X
Y
24
Durability of MEMS Storage Enclosures
Probabilities that a MEMS storage enclosure has up to k failure during (0, t]
2 failures
4 failures
6 failures
1 failure
Disk23 Yrs
No failure
8 failures
10 failures