ibm systems group © 2004 ibm corporation nick jones what could happen to your data? what can you do...

22
IBM Systems Group Nick Jones © 2004 IBM Corporation What could happen to your data? What can you do about it?

Upload: megan-conley

Post on 28-Mar-2015

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems Group

Nick Jones © 2004 IBM Corporation

What could happen to your data?

What can you do about it?

Page 2: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation2 Nick Jones

The Plan

Introduction

Types of failure

The probability of failure

The true cost of failure

Addressing the problem

Putting it into perspective

Questions & summary

Page 3: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation3 Nick Jones

Types of failure

Consider a pessimist’s view of a hard disk

Two ways in which a drive can fail

– It reports the failure

– It lies

Page 4: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation4 Nick Jones

The probability of failure

Mean Time Between Failure ≈ 1,200,000 hours

Drive failure doesn’t sound to be too big a problem…

…but then consider the number of drives

Page 5: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation5 Nick Jones

The true cost of failure

“It will never happen to me”

Increased disk size means increased data loss

A few statistics

Page 6: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation6 Nick Jones

Addressing the problem

Make backups

Add extra information to the disk

Add extra disks

Page 7: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation7 Nick Jones

Addressing the problem: Extra information

Error Correcting Code (ECC) on the disk drive

Client data Drive ECC

Data seen by the drive

Page 8: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation8 Nick Jones

Addressing the problem: Extra information

Longitudinal Redundancy Check (LRC) in addition to ECC

Block LRCClient data Drive ECC

Page 9: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation9 Nick Jones

Addressing the problem: Extra disks

The idea was published in 1988

A Case for Redundant Arrays of Inexpensive Disks by Patterson, Gibson & Katz

Page 10: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation10 Nick Jones

RAID 0: Striping

ABCDEFG

M

I

E

A

N

J

F

B

O

K

G

C

P

L

H

D

RAID array

Data striped across member

disks

Page 11: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation11 Nick Jones

RAID 1: Mirroring

ABCDEFG

D

C

B

A

D

C

B

A

RAID array

Data mirrored across member

disks

Page 12: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation12 Nick Jones

RAID 10: Striping & Mirroring

ABCDE

K

H

E

B

K

H

E

B

Data striped across mirrored

pairs of disks

J

G

D

A

J

G

D

A

L

I

F

C

L

I

F

C

Page 13: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation13 Nick Jones

XOR based parity

Bitwise operator

If the two inputs are the same, the output is 0

If the two inputs are different, the output is 1

Bit 1 Bit 2 XOR result

0 0 0

0 1 1

1 0 1

1 1 0

Page 14: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation14 Nick Jones

XOR based parity: An example

0 1 1 0 0 1 0 1

0 0 1 1 0 0 1 1Data

Parity0 1 0 1 0 1 1 0

x x x x x x x x

Page 15: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation15 Nick Jones

XOR based parity: An example

0 1 1 0 0 1 0 1

0 0 1 1 0 0 1 1Data

Parity0 1 0 1 0 1 1 0

x x x x x x x

Page 16: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation16 Nick Jones

RAID 4: Parity

ABCDEFG

J

G

D

A

K

H

E

B

L

I

F

C

P4

P3

P2

P1

RAID array

Data striped across disks, with

one parity disk

Page 17: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation17 Nick Jones

Coping with failure

ABCDEFG

J

G

D

A

K

H

E

L

I

F

C

P4

P3

P2

P1

Error reading E

– Read D, F & P2

– XOR them to reconstruct E

– Write reconstructed E

E

B

Page 18: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation18 Nick Jones

Coping with failure

ABCDEFG

J

G

D

A

K

H

E

L

I

F

C

P4

P3

P2

P1

Drive loss

– Replace the drive

– Rebuild the data

– Redundancy restored

B

Page 19: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation19 Nick Jones

RAID 5: Rotate parity

ABCDEFG

J

H

F

P1

K

I

P2

A

L

P3

D

B

P4

G

E

C

RAID array

Data striped across disks, with

parity rotating

Page 20: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation20 Nick Jones

RAID 6: More parity

ABCDEFG

M

K

P2

A

N

L

Q2

B

O

P3

E

C

P

Q3

F

D

P4

I

G

P1

Q4

J

H

Q1

Data striped across disks, with 2 rotating parities

Page 21: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation21 Nick Jones

Putting it into perspective

Cannot survive on RAID alone

Avoid a single point of failure

– Fire, flood, power loss

Split your array across two sites

Human error

Backups still have a place

Page 22: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

IBM Systems & Technology Group

© 2004 IBM Corporation22 Nick Jones

Summary

Want to avoid any single point of failure

Disk drives do fail

RAID protects against drive failure

Mirroring & parity

RAID isn’t the ultimate solution

[email protected]