copyright 2004 koren & krishna ece655/koren part.8.1 university of massachusetts dept. of...

21
Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 8 Networks - 3

Upload: henry-perkins

Post on 17-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .1

UNIVERSITY OF MASSACHUSETTSDept. of Electrical & Computer

Engineering

Fault Tolerant ComputingECE 655

Part 8Networks - 3

Page 2: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .2

Hypercube Networks

Hn - An n-dimensional hypercube network - 2 nodes

A 0-dimensional hypercube H0 - a single node

Hn constructed by connecting the corresponding nodes of two Hn-1 networks

The edges added to connect corresponding nodes are called dimension-(n-1) edges

n

Dimension-0 edgeH1

Page 3: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .3

Hypercube - Examples

Dimension-0 edges

Dimension-1 edges

Dimension-2 edges

Dimension-3 edges

H4

H1

H2

H3H

3

Page 4: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .4

Routing in Hypercubes Specific numbering to simplify routing Number expressed in binary - if nodes i and j are

connected by a dimension-k edge, the names of i and j differ in only the k-th bit position

Example - nodes 0000 and 0010 differ in only the 2 bit position - connected by a dimension-1 edge

Example - a packet needs to travel from node 14=1110 to node 2=0010 in an H network

Possible routings - 1110 0110 (dimension 3) 0010

(dimension 2) 1110 1010 (dimension 2) 0010

(dimension 3)

2 42

Dimension-0 edgesDimension-1 edgesDimension-2 edgesDimension-3 edges

1

Page 5: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .5

Routing - General

In general - the distance between source and destination is the number of different bits in addresses

Going from X to Y can be accomplished by traveling once along each dimension in which they differ

X = x ... x ; Y=y ... y Define z = x y -

is the exclusive-or operator Packet must traverse an edge in every

dimension i for which z = 1

n-1 00 n-1

i

i

ii

Page 6: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .6

Fault Tolerance in Hypercubes Hn (for n2) can tolerate link failures - multiple

paths from any source to any destination Node failures can disrupt the operation One way is to increase the number of

communication ports of each node from n to n+1 and connecting these extra ports through additional links to one or more spare nodes

Example - two spare nodes - each a spare for 2 nodes of an Hn-1 sub-cube

Spare nodes may require 2 ports - can be reduced by using several crossbar switches whose outputs is connected to the corresponding spare node

Number of ports of the spare node is reduced to n+1 - same as for all other nodes

n-1

n-1

Page 7: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .7

An H4 Hypercube with Two Spare Nodes

S S

Page 8: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .8

Different Method of Fault-

Tolerance

Duplicating the processor in a few selected nodes

Each additional processor - spare also for any of the processors in the neighboring nodes

Example - nodes 0, 7, 8, 15 in H4 - modified to duplex nodes

Every node now has a spare at a distance no larger than 1

Replacing a faulty processor by a spare results in an additional communication delay

Page 9: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .9

Routing in Injured Hypercubes

Routing algorithm must be modified to route around the faulty nodes or links

Basic idea - list the dimensions along which the packet must travel, and traverse them one by one

As edges are traversed and are crossed off the list

If, due to a link or a node failure, the desired link is not available - another edge in the list, if any, is chosen for traversal

If packet arrives at some node to find all dimensions on its list down - it backtracks to the previous node and tries again

Page 10: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .10

Formal Routing Algorithm - Notations

TD - list of dimensions that the message has traveled on - in order of traversal; TD - in reversed order

- exclusive-or operation carried out k times, sequentially

Example - a means (a a ) a

D - destination, S - source, d=DS ( - bitwise exclusive-or operation on corresponding bits of D and S)

SC(A) - set of nodes visited if we travel on each of the dimensions listed in set A

Example - at node 0010 - SC(1,3)={0000,1000}

R

k

i

3

1 2 3i=1

i=1

Page 11: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .11

Notations - Cont.

e - n-bit vector consisting of a 1 in the i-th bit position and 0 everywhere else

Example - e = 100 Packets are assumed to consist of

(I) d; d=DS (II) Message being transmitted (the ``payload'') (III) List of dimensions taken so far - TD

- append operation TD x - append x to the list TD

transmit(j) - send packet (d e , message, TD j) along the j-th-dimensional link from the present node

i

2

n

j

3

Page 12: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .12

Routing Algorithm for Injured Hypercubes

Page 13: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .13

Example - H3

H3 with faulty node 011 node 000 wants to send a

packet to 111 At 000, d=111 - sends the

message out on dimension-0, to node 001

At 001, d = 110 and TD=(0) - attempts dimension-1 edge - impossible

Bit 2 of d is also 1 - checks and finds that the dimension-2 edge to 101 is available - message is sent to 101 and then to 111

Exercise - What if both 011 and 101 are down?

Dimension-0 edgesDimension-1 edgesDimension-2 edges

Page 14: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .14

Reliability of Point-to-Point Networks

Not necessarily a regular structure - often more than one path between any two nodes

Terminal Reliability - the probability that there exists an operational path between two specific nodes, given the probabilities of link failures

Example - calculating the terminal reliability for the source-sink pair N - N41

Page 15: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .15

Terminal Reliability - Example I

Three paths from N to N P ={X ,X } P ={X ,X } P ={X ,X ,X }

p (q ) - probability that link X is good (faulty) Nodes are assumed fault-free - if not, their failure

probability is incorporated into outgoing links Set of paths must be modified to an equivalent set

of mutually exclusive events - otherwise some events will be counted more than once

Mutually exclusive events - (I) P up ; (II) P up and P down ; (III) P up and both P and P down

41

1 1,2 2,4

2

3

3,41,3

1,2 2,3 3,4

i,ji,j i,j

3 211

1 2

Page 16: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .16

Calculating Terminal Reliability Terminal reliability of a network with m paths

P , ... , P from source to sink E (E ) - event in which path P is operational

(faulty)

R = P(Operational Path Exists) = P( E ) Set of events can be decomposed into mutually

exclusive events -

O. P. Exists = E (E E ) (E E E ) ... (E E … E )

R = P(E )+P(E E )+P(E E E ) + ... +P(E E … E )

1 m

m

i=1

i i

i

1

i

11

1

2 2

m-1

m

3

1

111 22 3

m

m-1

_

___

__

__

_

_

_

Page 17: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .17

Terminal Reliability - Cont.

The last expression can be rewritten using conditional probabilities

R = P(E )+P(E )P(E /E )+P(E )P(E E /E ) + ... +P(E )P(E … E /E )

The problem is calculating the probabilities P(E … E /E )

To identify the links which must must fail so that E occurs but not E ,…, E , conditional sets are used

S = P - P = { x | x P and x P } Identifying disjoint events in the general case is not

always straightforward

1 11 2 22

m

33

1 m-1 m

i-11 i

__

__

___

i 1 i-1

j/i iij j

Page 18: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .18

Terminal Reliability - Example II

This six-node network has 9 links - 6 uni-directional and 3 bi-directional

All paths from N1 to N6 -

Paths are ordered from shortest to longest

Page 19: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .19

Calculating Terminal Reliability for Example II

The first term for the reliability equation is P(E )=p p p

To calculate the second term in the reliability equation - the conditional set is used

S = P - P = {x , x } At least one of the links in this set must fail so

that P is faulty (while P is operational) The second term in the probability equation -

p p p (1-p p )

1/2 2

1 1,3 3,5

1

2

1,2

3,51,3

5,6

2,5

1

5,6 1,3 3,5

Page 20: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .20

Example II - Cont. For calculating other terms in the sum -

intersection of several conditional sets must be considered

Calculating the fourth term - expression for P - the conditional sets are: S ={x }; S ={x ,x ,x }; S ={x ,x }

S is included in S - if S is faulty, S is faulty

S can be ignored The fourth term in the reliability equation - p p p p (1-p )(1-p p )

3/4

1/4

5,6

4

2/4 2,5

1,3

1,25,6

2,43,5 4,5 4,6 5,6 1,2

2,41,2

1/4

1/4

2/4

2/4 2/4

Page 21: Copyright 2004 Koren & Krishna ECE655/Koren Part.8.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE

Copyright 2004 Koren & Krishna ECE655/Koren Part.8 .21

Example II - Cont. Calculating the third term S

= {x ,x ,x } ; S = {x ,x } The two conditional sets are not disjoint The event both S and S are faulty

needs to be divided into disjoint events: (I) x is faulty (II) x is operational and both x and

x are faulty (III) Both x and x are up, and

both x and x are faulty Resulting expression for third term p p p (q + p q q + p p q q ) Remaining terms - calculated similarly Terminal reliability is the sum of all thirteen

terms

2/3 1/3 1,3 5,63,5 2,5 5,6

1/3 2/3

5,6

5,6

1,31,2

1,3

1,3

2,5

3,52,5

2,5

5,65,65,6

5,63,5

2,4 4,6 1,3 2,5