ibm systems group © 2004 ibm corporation nick jones what could happen to your data? what can you do...

Post on 28-Mar-2015

216 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IBM Systems Group

Nick Jones © 2004 IBM Corporation

What could happen to your data?

What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation2 Nick Jones

The Plan

Introduction

Types of failure

The probability of failure

The true cost of failure

Addressing the problem

Putting it into perspective

Questions & summary

IBM Systems & Technology Group

© 2004 IBM Corporation3 Nick Jones

Types of failure

Consider a pessimist’s view of a hard disk

Two ways in which a drive can fail

– It reports the failure

– It lies

IBM Systems & Technology Group

© 2004 IBM Corporation4 Nick Jones

The probability of failure

Mean Time Between Failure ≈ 1,200,000 hours

Drive failure doesn’t sound to be too big a problem…

…but then consider the number of drives

IBM Systems & Technology Group

© 2004 IBM Corporation5 Nick Jones

The true cost of failure

“It will never happen to me”

Increased disk size means increased data loss

A few statistics

IBM Systems & Technology Group

© 2004 IBM Corporation6 Nick Jones

Addressing the problem

Make backups

Add extra information to the disk

Add extra disks

IBM Systems & Technology Group

© 2004 IBM Corporation7 Nick Jones

Addressing the problem: Extra information

Error Correcting Code (ECC) on the disk drive

Client data Drive ECC

Data seen by the drive

IBM Systems & Technology Group

© 2004 IBM Corporation8 Nick Jones

Addressing the problem: Extra information

Longitudinal Redundancy Check (LRC) in addition to ECC

Block LRCClient data Drive ECC

IBM Systems & Technology Group

© 2004 IBM Corporation9 Nick Jones

Addressing the problem: Extra disks

The idea was published in 1988

A Case for Redundant Arrays of Inexpensive Disks by Patterson, Gibson & Katz

IBM Systems & Technology Group

© 2004 IBM Corporation10 Nick Jones

RAID 0: Striping

ABCDEFG

M

I

E

A

N

J

F

B

O

K

G

C

P

L

H

D

RAID array

Data striped across member

disks

IBM Systems & Technology Group

© 2004 IBM Corporation11 Nick Jones

RAID 1: Mirroring

ABCDEFG

D

C

B

A

D

C

B

A

RAID array

Data mirrored across member

disks

IBM Systems & Technology Group

© 2004 IBM Corporation12 Nick Jones

RAID 10: Striping & Mirroring

ABCDE

K

H

E

B

K

H

E

B

Data striped across mirrored

pairs of disks

J

G

D

A

J

G

D

A

L

I

F

C

L

I

F

C

IBM Systems & Technology Group

© 2004 IBM Corporation13 Nick Jones

XOR based parity

Bitwise operator

If the two inputs are the same, the output is 0

If the two inputs are different, the output is 1

Bit 1 Bit 2 XOR result

0 0 0

0 1 1

1 0 1

1 1 0

IBM Systems & Technology Group

© 2004 IBM Corporation14 Nick Jones

XOR based parity: An example

0 1 1 0 0 1 0 1

0 0 1 1 0 0 1 1Data

Parity0 1 0 1 0 1 1 0

x x x x x x x x

IBM Systems & Technology Group

© 2004 IBM Corporation15 Nick Jones

XOR based parity: An example

0 1 1 0 0 1 0 1

0 0 1 1 0 0 1 1Data

Parity0 1 0 1 0 1 1 0

x x x x x x x

IBM Systems & Technology Group

© 2004 IBM Corporation16 Nick Jones

RAID 4: Parity

ABCDEFG

J

G

D

A

K

H

E

B

L

I

F

C

P4

P3

P2

P1

RAID array

Data striped across disks, with

one parity disk

IBM Systems & Technology Group

© 2004 IBM Corporation17 Nick Jones

Coping with failure

ABCDEFG

J

G

D

A

K

H

E

L

I

F

C

P4

P3

P2

P1

Error reading E

– Read D, F & P2

– XOR them to reconstruct E

– Write reconstructed E

E

B

IBM Systems & Technology Group

© 2004 IBM Corporation18 Nick Jones

Coping with failure

ABCDEFG

J

G

D

A

K

H

E

L

I

F

C

P4

P3

P2

P1

Drive loss

– Replace the drive

– Rebuild the data

– Redundancy restored

B

IBM Systems & Technology Group

© 2004 IBM Corporation19 Nick Jones

RAID 5: Rotate parity

ABCDEFG

J

H

F

P1

K

I

P2

A

L

P3

D

B

P4

G

E

C

RAID array

Data striped across disks, with

parity rotating

IBM Systems & Technology Group

© 2004 IBM Corporation20 Nick Jones

RAID 6: More parity

ABCDEFG

M

K

P2

A

N

L

Q2

B

O

P3

E

C

P

Q3

F

D

P4

I

G

P1

Q4

J

H

Q1

Data striped across disks, with 2 rotating parities

IBM Systems & Technology Group

© 2004 IBM Corporation21 Nick Jones

Putting it into perspective

Cannot survive on RAID alone

Avoid a single point of failure

– Fire, flood, power loss

Split your array across two sites

Human error

Backups still have a place

IBM Systems & Technology Group

© 2004 IBM Corporation22 Nick Jones

Summary

Want to avoid any single point of failure

Disk drives do fail

RAID protects against drive failure

Mirroring & parity

RAID isn’t the ultimate solution

Nick.Jones@uk.ibm.com

top related