computing in the reliable array of independent nodes vasken bohossian, charles fan, paul lemahieu,...

93
Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems California Institute of Technology Marc Riedel Marc Riedel

Post on 19-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Computing in theReliable Array of Independent Nodes

Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck

May 5, 2000

IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems

California Institute of Technology

Marc RiedelMarc Riedel

Page 2: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN Project

Collaboration:

• Caltech’s Parallel and Distributed Computing Group www.paradise.caltech.edu

• JPL’s Center for Integrated Space Microsystems www.csmt.jpl.nasa.gov

Page 3: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN Platform

switchswitch

bus

netw

ork

Heterogeneous network of nodes and switches

nodenode

nodenode

nodenode

nodenode

switch

nodenode

nodenode

Page 4: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN Testbed

www.paradise.caltech.edu

10 Pentium boxesw/multiple NICs

4 eight-way Myrinet Switches

Page 5: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Proof of Concept: Video Server

AA BB CC DD

switch switch

Video client & server on every node.

Page 6: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Limited Storage

AA BB CC DD

Insufficient storage to replicate all the data on each node.

switch switch

Page 7: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

k-of-n Code

ad+c

bd+a

ca+b

db+c

from any k of n columns

b = a+b a+

d = d+c c+

a b c d

recover data

Erasure-correcting code:

Page 8: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Encoding

AA BB CC DD

Encode video using 2-of-4 code.

switch switch

Page 9: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Decoding

AA BB CC DD

Retrieve data and decode.

switch switch

Page 10: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Node Failure

AA BB CC DD

switch switch

Page 11: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Node Failure

AA BB CC D

switch switch

Page 12: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Node Failure

Dynamically switch to another node.

AA BB CC D

switch switch

Page 13: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Link Failure

BB DCCAA

switch switch

Page 14: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Link Failure

AA BB CC D

switch switch

Page 15: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Link Failure

Dynamically switch to another network path.

AA BB DCC

switch switch

Page 16: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Switch Failure

AA BB DCC

switch switch

Page 17: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Switch Failure

AA BB CC D

switch switch

Page 18: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Switch Failure

AA CC D

Dynamically switch to another network path.

BB

switch switch

Page 19: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Node Recovery

AA CCBB DD

switch switch

Page 20: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Node Recovery

AA CCBB DD

switch switch

Continuous reconfiguration (e.g., load-balancing).

Page 21: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Features

• tolerates multiple node/link/switch failures• no single point of failure

High availability:

Certified

Buzz-Word

Compliant

• multiple data paths • redundant storage• graceful degradation

Efficient use of resources:

Dynamic scalability/reconfigurability

Page 22: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN Project: Goals

NetworksCommunication

key building blocks

Storage

Applications

Efficient, reliable distributed computing and storage systems:

Page 23: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Topics

• Fault-Tolerant Interconnect Topologies

• Connectivity

• Group Membership

• Distributed Storage

Today’s Talk:

NetworksCommunication

Storage

Applications

Page 24: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Interconnect Topologies

= computing/storage node

Network

Goal: lose at most a constant number of nodes for given network loss

NN NN NN NN NN NN NN NN NN NN

NN

Page 25: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

= computing/storage node

Network

NN NN NN NN NN NN NN NN NN NN

NN

Resistance to Partitions

Large partitions problematic for distributed services/computation

Page 26: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Resistance to Partitions

= computing/storage node

Large partitions problematic for distributed services/computation

NN

NN NN NN NN NN NN NN NN NN NN

Network

Page 27: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Related Work

• Hayes et al., Bruck et al., Boesch et al.

Embedding hypercubes, rings, meshes, trees in fault-tolerant networks:

• Ku and Hayes, 1997. “ Connective Fault-Tolerance in Multiple-Bus Systems”

Bus-based networks which are resistant to partitioning:

IEEE ACM

Page 28: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

A Ring of Switches

NN

SS

SS

SS

SS

SSSS

SS

NN

NN

NN

NN

NN

NN

= Node

= SwitchSS

NN

a naïve solution

degree-2 compute nodes,degree-4 switches

Page 29: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

A Ring of Switches

NN

SS

SS

SS

SS

SSSS

SS

NN

NN

NN

NN

NN

NN

= Node

= SwitchSS

NN

a naïve solution

degree-2 compute nodes,degree-4 switches

Page 30: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

= Node

= SwitchSS

NN

easily partitioned

A Ring of Switches

a naïve solution

degree-2 compute nodes,degree-4 switches NN

SS SS

SS

SS

SS

NN

NN

NN

NN

NN

NN

S

S

Page 31: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Resistance to Partitioning

11

11 22

33

44

55

66

77

88

22

33

4455

66

77

88nodes on diagonals

degree-2 compute nodes,degree-4 switches

Page 32: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Resistance to Partitioning

11

11 22

33

44

55

66

77

88

22

33

4455

66

77

88nodes on diagonals

degree-2 compute nodes,degree-4 switches

Page 33: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

nodes on diagonals

degree-2 compute nodes,degree-4 switches

• tolerates any 3 switch failures (optimal)

• generalizes to arbitrary node/switch degrees.

Resistance to Partitioning

Details: paper IPPS’98, www.paradise.caltech.edu

22

33

55

77

88

22

33

4455

77

88

1

1

46

6

Page 34: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

11

11 22

33

44

55

77

88

22

33

4455

66

77

88

66

11

22

33

44

55

66

77

88

11

22

33 44

55

66

7788

Resistance to Partitioning

Isomorphic

Details: paper IPPS’98, www.paradise.caltech.edu

Page 35: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Point-to-Point Connectivity

Is the path from A to B up or down?

?nodenode

nodenode

nodenode

nodenode

nodenode

nodenode

A

B

Network

Page 36: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Connectivity

Link is seen as up or down by each node.

NodeA

NodeB

{U,D} {U,D}

Bi-directional communication.

Each node sends out pings.A node may time-out, deciding the link is down.

Page 37: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Consistent History

A B

U

D

UU

U

D

DD

Time

NodeState

A B

U

D

U

U

U

D D

D

Time

U

UD

NodeState

A B

Page 38: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

The Slack

NodeState

A B

Time

U

D

U

U

U

D D

DU

UD

A is 1 ahead

A is 2 ahead

Now A will wait for B to transition

Slack n=2:at most 2 unacknowledged transitions before a node waits

Page 39: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Consistent History

Consistency in error reporting:If A sees channel error, B sees channel error.

Birman et al.: “Reliability Through Consistency”

NodeA

NodeB

{U,D} {U,D}

Details: paper IPPS’99, www.paradise.caltech.edu

Page 40: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

DDCC

ABCDABCD

ABCDABCD

• link/node failures• dynamic reconfiguration

Consistent global view given local, point-to-point connectivity information

Page 41: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Related Work

Totem, Isis/Horus, TransisSystems

TheoryChandra et al., Impossibility of Group Membership in an Asynchronous Environment

IEEE ACM

Page 42: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Token-Ring based Group Membership Protocol

BBAA

CCDD

Group Membership

Page 43: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:

Token-Ring based Group Membership Protocol

1: ABCD

Page 44: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:

Token-Ring based Group Membership Protocol

1: ABCD

1

Page 45: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:1

Token-Ring based Group Membership Protocol

2

2: ABCD

Page 46: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:1

Token-Ring based Group Membership Protocol

2

3

3: ABCD

Page 47: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:1

Token-Ring based Group Membership Protocol

2

34

4: ABCD

Page 48: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

• group membership list

• sequence number

Token carries:5

Token-Ring based Group Membership Protocol

2

34

Page 49: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BBAA

CCDD

5 2

34

Node or link fails:

Page 50: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

34

Node or link fails:

Page 51: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

?

5

34

Node or link fails:

Page 52: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

?

5

34

Node or link fails:

Page 53: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

34

If a node is inaccessible,it is excluded and bypassed.

5: ACD

Node or link fails:

Page 54: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

64

If a node is inaccessible,it is excluded and bypassed.

6: ACD

Node or link fails:

Page 55: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

67

If a node is inaccessible,it is excluded and bypassed.

Node or link fails:

Page 56: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

67

If a node is inaccessible,it is excluded and bypassed.

Node or link fails:

Page 57: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

BAA

CCDD

5

67

Node with token fails:

Page 58: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

B

Node with token fails:

Page 59: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

B

?

?

Node with token fails:

Page 60: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

If the token is lost,it is regenerated.

B

?

?

Node with token fails:

Page 61: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

If the token is lost,it is regenerated.

B

Node with token fails:

Page 62: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

6: AD

If the token is lost,it is regenerated.

B

5: ACD

Node with token fails:

Page 63: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

5

6

6: AD

If the token is lost,it is regenerated.

B

5: ACD

Highest sequence numberprevails.

Node with token fails:

Page 64: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CCD

7

6

Highest sequence numberprevails.

If the token is lost,it is regenerated.

B

Node with token fails:

Page 65: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

7

6

Node recovers:

B

DD

Page 66: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

7

6

B

DD

Recovering nodesare added.

Node recovers:

Page 67: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

7

6

B

DD

Recovering nodesare added.

7: ADC

Node recovers:

Page 68: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

7

6

B

DD

Recovering nodesare added.

8: ADC

8

Node recovers:

Page 69: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

7

9

B

DD

Recovering nodesare added.

9: ADC

8

Node recovers:

Page 70: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

10 B

DD

Recovering nodesare added.

98

Node recovers:

Page 71: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Group Membership

AA

CC

10 B

DD 98

• Unicast messages

• Dynamic reconfiguration

• Mean time-to-failure > convergence time

Features:

Details: publication forthcoming.

Page 72: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Distributed Storage

disk disk diskdisk

101001001000101001001000

Page 73: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Distributed Storage

disk disk disk

Focus: reliability and performance.

disk

1010 10 101 11

Page 74: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Array Codes

ad+c

bd+a

ca+b

db+c

Ideally suited for distributed storage. Low encoding/decoding complexity.

dataredundancy

“B-code”

Page 75: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Array Codes

ad+c

bd+a

ca+b

db+c

Ideally suited for distributed storage. Low encoding/decoding complexity.

from any k of n columns

b = a+b a+

d = d+c c+

a b c d

recover data

Page 76: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Array Codes

ad+c

bd+a

ca+b

db+c

Ideally suited for distributed storage. Low encoding/decoding complexity.

a b c d

Details: IEEE Trans. Info Theory, www.paradise.caltech.edu

B-Code and X-Code:• optimally redundant• optimal encoding/decoding complexity

Page 77: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Summary

1

1 2

3

4

5

6

7

8

2

3

45

6

7

8

Fault-tolerant Interconnect Topologies

Connectivity

A B

{U,D} {U,D}

Group Membership

BA

CD

1: ABCD

1 2

34

2: ABCD

3: ABCD

4: ABCD

Distributed Storage

a

d+c

b

d+a

c

a+b

d

b+c

Page 78: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Proof-of-Concept Applications

RAINVideoHigh-availability video server

RAINCheckDistributed checkpoint rollback/recovery system

SNOWStable Network of Webservers

Page 79: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Rainfinity

www.rainfinity.com

Start-up based on RAIN technology

• availability

• scalability

• performance

Clustered solutions for Internet data centers, focusing on:

Business Plan:

Page 80: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Rainfinity

• Founded Sept. 1998

• Released first product April 1999

• Received $15 million funding in Dec. 1999

• Now over 50 employees

www.rainfinity.com

Start-up based on RAIN technology

Company:

Page 81: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Future Research

• Development of API’s• Fault-Tolerant Distributed Filesystem• Fault-Tolerant MPI/PVM implementation

Page 82: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

End of Talk

Material that was cut...

Page 83: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Erasure Correcting Codes

data

k

1 0 1 0 1 1 0 1 0 0 010

encoded data

n

Strategy:encode data with an erasure-correcting code.

Page 84: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Erasure Correcting Codes

k

1 0 1 0 1 1 0 1 0 0 010

n

lose up to m coordinates

data

Strategy:encode data with an erasure-correcting code.

Page 85: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Erasure Correcting Codes

1 0 1 0 1 1 0 1 0 0 010

n

reconstructed data

k

Strategy:encode data with an erasure-correcting code.

lose up to m coordinates

k

data

Page 86: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

Erasure Correcting Codes

Code is optimally-redundant (MDS) if . knm Example: Reed-Solomon code.

1 0 1 0 1 1 0 1 0 0 010

n

reconstructed data

k

lose up to m coordinates

k

data

Page 87: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Store

ad+c

bd+a

ca+b

db+c

disk disk disk disk

ad+c

bd+a

ca+b

db+c

• Encode data with (n, k) array code

• Store one symbol per node

Page 88: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk

bd+a

disk

ca+b

disk

db+c

• Retrieve encoded data from any k nodes

• Reconstruct data

a b c d

Page 89: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk

bd+a

disk

ca+b

disk

db+c

a b c d• Reliability (similar to RAID systems)

Page 90: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk

bd+a

disk

ca+b

disk

db+c

a b c d• Reliability (similar to RAID systems)

Page 91: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk disk

ca+b

disk

• Reliability (similar to RAID systems)

a b c d

• Performance: load-balancing

Page 92: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk disk

ca+b

disk

db+c

a b c d• Reliability (similar to RAID systems)

• Performance: load-balancing

Page 93: Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE

RAIN: Distributed Retrieve

disk

ad+c

disk disk

ca+b

disk

db+c

busy!

a b c d• Reliability (similar to RAID systems)

• Performance: load-balancing