raid storage eel4768, fall 2009 dr. jun wang slides prepared based on d&p computer architecture...
TRANSCRIPT
![Page 1: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/1.jpg)
RAID Storage
EEL4768, Fall 2009Dr. Jun Wang
Slides Prepared based on D&P Computer Architecture textbook, etc.
![Page 2: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/2.jpg)
04/19/23 2
Case for Storage
• Shift in focus from computation to communication and storage of information
– E.g., Cray Research/Thinking Machines vs. Google/Yahoo– “The Computing Revolution” (1960s to 1980s)
“The Information Age” (1990 to today)
• Storage emphasizes reliability and scalability as well as cost-performance
• What is “Software king” that determines which HW actually features used?
– Operating System for storage– Compiler for processor
• Also has own performance theory—queuing theory—balances throughput vs. response time
![Page 3: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/3.jpg)
04/19/23 3
Outline
• Magnetic Disks
• RAID
![Page 4: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/4.jpg)
4
What’s Inside A Disk Drive?
![Page 5: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/5.jpg)
5
Disk drive – electronics
![Page 6: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/6.jpg)
6
Components of a disk drive
![Page 7: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/7.jpg)
7
Disk Components
RecordingPositioning
![Page 8: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/8.jpg)
8
Top view of a single disk platter
![Page 9: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/9.jpg)
9
Surface organized into tracks
![Page 10: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/10.jpg)
10
Tracks broken up into sectors
![Page 11: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/11.jpg)
11
![Page 12: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/12.jpg)
12
![Page 13: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/13.jpg)
13
![Page 14: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/14.jpg)
14
![Page 15: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/15.jpg)
15
![Page 16: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/16.jpg)
16
![Page 17: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/17.jpg)
17
![Page 18: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/18.jpg)
18
![Page 19: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/19.jpg)
19
![Page 20: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/20.jpg)
20
Response time for disks
• Access time: (service time for a disk access)– Command + Seek + Rotation + Transfer
• Response time:– Queue time + Access time
![Page 21: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/21.jpg)
04/19/23 21
Disk Figure of Merit: Areal Density• Bits recorded along a track
– Metric is Bits Per Inch (BPI)
• Number of tracks per surface– Metric is Tracks Per Inch (TPI)
• Disk Designs Brag about bit density per unit area– Metric is Bits Per Square Inch: Areal Density = BPI x TPI
Year Areal Density1973 2 1979 8 1989 63 1997 3,090 2000 17,100 2006 130,000
1
10
100
1,000
10,000
100,000
1,000,000
1970 1980 1990 2000 2010
Year
Are
al D
ensi
ty
![Page 22: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/22.jpg)
04/19/23 22
Historical Perspective
• 1956 IBM Ramac — early 1970s Winchester– Developed for mainframe computers, proprietary interfaces– Steady shrink in form factor: 27 in. to 14 in.
• Form factor and capacity drives market more than performance• 1970s developments
– 5.25 inch floppy disk formfactor (microcode into mainframe)– Emergence of industry standard disk interfaces
• Early 1980s: PCs and first generation workstations• Mid 1980s: Client/server computing
– Centralized storage on file server» accelerates disk downsizing: 8 inch to 5.25
– Mass market disk drives become a reality» industry standards: SCSI, IPI, IDE» 5.25 inch to 3.5 inch drives for PCs, End of proprietary interfaces
• 1900s: Laptops => 2.5 inch drives• 2000s: What new devices leading to new drives?
![Page 23: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/23.jpg)
04/19/23 23
Future Disk Size and Performance
• Continued advance in capacity (60%/yr) and bandwidth (40%/yr)
• Slow improvement in seek, rotation (8%/yr)• Time to read whole disk
Year Sequentially Randomly (1 sector/seek)
1990 4 minutes 6 hours2000 12 minutes 1 week(!)2006 56 minutes 3 weeks (SCSI)2006 171 minutes 7 weeks (SATA)
![Page 24: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/24.jpg)
04/19/23 24
Use Arrays of Small Disks?
14”10”5.25”3.5”
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Low End High End
•Katz and Patterson asked in 1987: •Can smaller disks be used to close gap in performance between disks and CPUs?
![Page 25: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/25.jpg)
04/19/23 25
Advantages of Small Formfactor Disk Drives
Low cost/MBHigh MB/volumeHigh MB/wattLow cost/Actuator
Cost and Environmental Efficiencies
![Page 26: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/26.jpg)
04/19/23 26
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks)
Capacity
Volume
Power
Data Rate
I/O Rate
MTTF
Cost
IBM 3390K
20 GBytes
97 cu. ft.
3 KW
15 MB/s
600 I/Os/s
250 KHrs
$250K
IBM 3.5" 0061
320 MBytes
0.1 cu. ft.
11 W
1.5 MB/s
55 I/Os/s
50 KHrs
$2K
x70
23 GBytes
11 cu. ft.
1 KW
120 MB/s
3900 IOs/s
??? Hrs
$150K
Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?
9X
3X
8X
6X
![Page 27: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/27.jpg)
04/19/23 27
Array Reliability
• Reliability of N disks = Reliability of 1 Disk ÷ N
50,000 Hours ÷ 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (without redundancy) too unreliable to be useful!
Hot spares support reconstruction in parallel with access: very high media availability can be achievedHot spares support reconstruction in parallel with access: very high media availability can be achieved
![Page 28: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/28.jpg)
04/19/23 28
Redundant Arrays of (Inexpensive) Disks
• Files are "striped" across multiple disks
• Redundancy yields high data availability
– Availability: service still provided to user, even if some components failed
• Disks will still fail• Contents reconstructed from data redundantly
stored in the array Capacity penalty to store redundant info
Bandwidth penalty to update redundant info
![Page 29: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/29.jpg)
04/19/23 29
Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing
• Each disk is fully duplicated onto its “mirror” Very high availability can be achieved• Bandwidth sacrifice on write: Logical write = two physical writes
• Reads may be optimized• Most expensive solution: 100% capacity overhead
• (RAID 2 not interesting, so skip)
recoverygroup
![Page 30: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/30.jpg)
04/19/23 30
Redundant Array of Inexpensive Disks RAID 3: Parity Disk
P
100100111100110110010011
. . .logical record 1
0100011
11001101
10100011
11001101
P contains sum ofother disks per stripe mod 2 (“parity”)If disk fails, subtract P from sum of other disks to find missing information
Striped physicalrecords
![Page 31: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/31.jpg)
04/19/23 31
RAID 3
• Sum computed across recovery group to protect against hard disk failures, stored in P disk
• Logically, a single high capacity, high transfer rate disk: good for large transfers
• Wider arrays reduce capacity costs, but decreases availability
• 33% capacity cost for parity if 3 data disks and 1 parity disk
![Page 32: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/32.jpg)
04/19/23 32
Inspiration for RAID 4
• RAID 3 relies on parity disk to discover errors on Read
• But every sector has an error detection field• To catch errors on read, rely on error
detection field vs. the parity disk• Allows independent reads to different disks
simultaneously
![Page 33: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/33.jpg)
04/19/23 33
Redundant Arrays of Inexpensive Disks RAID 4: High I/O Rate Parity
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogicalDisk
Address
Stripe
Insides of 5 disksInsides of 5 disks
Example:small read D0 & D5, large write D12-D15
Example:small read D0 & D5, large write D12-D15
![Page 34: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/34.jpg)
04/19/23 34
Inspiration for RAID 5
• RAID 4 works well for small reads• Small writes (write to one disk):
– Option 1: read other data disks, create new sum and write to Parity Disk
– Option 2: since P has old sum, compare old data to new data, add the difference to P
• Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk
D0 D1 D2 D3 P
D4 D5 D6 PD7
![Page 35: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/35.jpg)
04/19/23 35
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity
Independent writespossible because ofinterleaved parity
Independent writespossible because ofinterleaved parity
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogical
Disk Addresses
Example: write to D0, D5 uses disks 0, 1, 3, 4
![Page 36: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/36.jpg)
04/19/23 36
Problems of Disk Arrays: Small Writes
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
old parity
XOR
XOR
(1. Read) (2. Read)
(3. Write) (4. Write)
RAID-5: Small Write Algorithm
1 Logical Write = 2 Physical Reads + 2 Physical Writes
![Page 37: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/37.jpg)
04/19/23 37
RAID 6: Recovering from 2 failures
• Why > 1 failure recovery?– operator accidentally replaces the wrong disk during a
failure
– since disk bandwidth is growing more slowly than disk capacity, the MTT Repair a disk in a RAID system is increasing increases the chances of a 2nd failure during repair since takes longer
– reading much more data during reconstruction meant increasing the chance of an uncorrectable media failure, which would result in data loss
![Page 38: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/38.jpg)
04/19/23 38
RAID 6: Recovering from 2 failures• Network Appliance’s row-diagonal parity or RAID-DP
• Like the standard RAID schemes, it uses redundant space based on parity calculation per stripe
• Since it is protecting against a double failure, it adds two check blocks per stripe of data.
– If p+1 disks total, p-1 disks have data; assume p=5
• Row parity disk is just like in RAID 4 – Even parity across the other 4 data blocks in its stripe
• Each block of the diagonal parity disk contains the even parity of the blocks in the same diagonal
![Page 39: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/39.jpg)
39
RAID 10
• RAID Level 10 provides very high performance and redundancy. Data is simultaneously mirrored and striped. Can under circumstances support multiple drive failures.
• Advantages
– Highly fault tolerant
– High data availability
– Very good read / write performance
• Disadvantages
– Very expensive
– Drive spindles must be synchronized
– Not very scalable
• Applications
– Where high performance and redundancy are critical
![Page 40: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/40.jpg)
40
Two Types of RAID Implementations
• Hardware RAID– The hardware-based system manages the RAID
subsystem independently from the host and presents to the host only a single disk per RAID array
• Software RAID– Software RAID implements the various RAID levels in the
kernel disk (block device) code. It offers the cheapest possible solution, as expensive disk controller cards are not required.
![Page 41: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/41.jpg)
04/19/23 41
How Good is the Protection
![Page 42: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/42.jpg)
42
Orthogonal RAID
ArrayController
StringController
StringController
StringController
StringController
StringController
StringController
. . .
. . .
. . .
. . .
. . .
. . .
Option 1 Option 2
Error Correction Group Option
![Page 43: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/43.jpg)
43
EMC Symmetrix 8000
• 384 disks in total73 GB/drive28TB in total• channel director connects server host to buses• disk cache mem 90%-95% hit rate LRU with prefetch• each disk director has two Ultral SCSI strings• Rather than XOR engine only in the disk directors, RAID5 parity calculations are done in drives side.WHY?
![Page 44: RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc](https://reader038.vdocument.in/reader038/viewer/2022110401/56649e105503460f94afc129/html5/thumbnails/44.jpg)
04/19/23 44
Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage
• Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow" Logical write = two physical writes
100% capacity overhead
• Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally
Logically a single high data bw disk
• High I/O Rate Parity Array (RAID 5)
Interleaved parity blocks
Independent reads and writes
Logical write = 2 reads + 2 writes
10010011
11001101
10010011
00110010
10010011
10010011