csce 668 distributed algorithms and systems

36
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 12: Causality 1

Upload: simon-barnes

Post on 31-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS. Spring 2014 Prof. Jennifer Welch. p 0. p 0. m 0. m 1. m 0. m 1. p 1. p 1. Logical Clocks Motivation. In an asynchronous system, we often cannot tell which of two events occurred before the other: - PowerPoint PPT Presentation

TRANSCRIPT

CSCE 668DISTRIBUTED ALGORITHMS AND SYSTEMS

Spring 2014Prof. Jennifer WelchCSCE 668

Set 12: Causality 1

Logical Clocks Motivation

CSCE 668Set 12: Causality

2

In an asynchronous system, we often cannot tell which of two events occurred before the other:

Example A Example Bp0

p1

m0 m1

p0

p1

m0 m1

In Example A, processors cannot tellwhich message was sent first. Probably not important.

In Example B, processors can tellwhich message was sent first. Might be important.

Let's try to determine relative ordering of some (not all) events.

Happens Before Partial Order

CSCE 668Set 12: Causality

3

Given an execution, computation event a happens before computation event b, denoted a b, if

a and b occur at same processor and a precedes b, or

a results in sending m and b includes receipt of m, or

there exists computation event c such that a c and c b (transitive closure)

Happens Before Partial Order

CSCE 668Set 12: Causality

4

Happens before means that information can flow from a to b, i.e., that a might cause b.

p0

p1

m0 m1

a d

b c

b c

a b

a c

c d

a d

b d

Concurrent Events

CSCE 668Set 12: Causality

5

If a does not happen before b, and b does not happen before a, then a and b are concurrent, denoted a || b.

Happens Before Example

CSCE 668Set 12: Causality

6

Rule 1: a b, c d e f, g h, h i

Rule 2: a d, g e, f i

Rule 3: a e, c i, …h || e, …

Logical Clocks

CSCE 668Set 12: Causality

7

Logical clocks are values assigned to events to provide some information about the order in which events happen.

Goal is to assign an integer L(e) to each computation event e in an execution such thatif a b, then L(a) < L(b).

Logical Timestamps Algorithm

CSCE 668Set 12: Causality

8

Each pi keeps a counter (logical timestamp) Li, initially 0

Every message that pi sends is timestamped with current value of Li

Li is incremented at each step by pi to be greater than its current value, and the timestamps on all messages received at this

step If a is an event at pi, then assign L(a) to be

the value of Li at the end of a.

Logical Timestamps Example

CSCE 668Set 12: Causality

9

1

2 3 4

1 2 5

2

1

a b : L(a) = 1 < 2 = L(b)f i : L(f) = 4 < 5 = L(i)a e : L(a) = 1 < 3 = L(e)etc.

Getting a Total Order

CSCE 668Set 12: Causality

10

If a total order is required, break ties using ids.

In the example, L(a) = (1,0), L(c) = (1,1), etc.

Timestamps are ordered lexicographically.

In the example, L(a) < L(c).

Drawback of Logical Clocks

CSCE 668Set 12: Causality

11

a b implies L(a) < L(b), but L(a) < L(b) does not necessarily imply a b.

In previous example, L(g) = 1 and L(b) = 2, but g does not happen before b.

Reason is that "happens before" is a partial order, but logical clock values are integers, which are totally ordered.

Vector Clocks

CSCE 668Set 12: Causality

12

Generalize logical clocks to provide non-causality information as well as causality information.

Implement with values drawn from a partially ordered set instead of a totally ordered set.

Assign a value V(e) to each computation event e in an execution such that a b if and only if V(a) < V(b).

Vector Timestamps Algorithm

CSCE 668Set 12: Causality

13

Each pi keeps an n-vector Vi, initially all 0's Entry j in Vi is pi's estimate of how many steps

pj has taken Every msg pi sends is timestamped with

current value of Vi

At every step, increment Vi[i] by 1 When receiving a message with vector

timestamp T, update each of Vi 's components j ≠ i so that Vi[j] = max(T[j],Vi[j])

If a is an event at pi, then assign V(a) to be value of Vi at end of a.

Manipulating Vector Timestamps

CSCE 668Set 12: Causality

14

Let V and W be two n-vectors of integers.Equality: V = W iff V[i] = W[i] for all i.

Example: (3,2,4) = (3,2,4)Less than or equal: V ≤ W iff V[i] ≤ W[i] for all

i.Example: (2,2,3) ≤ (3,2,4) and (3,2,4) ≤ (3,2,4)

Less than: V < W iff V ≤ W but V ≠ W.Example: (2,2,3) < (3,2,4)

Incomparable: V || W iff !(V ≤ W) and !(W ≤ V).Example: (3,2,4) || (4,1,4)

Manipulating Vector Timestamps

CSCE 668Set 12: Causality

15

The partial order on n-vectors just defined is not the same as lexicographic ordering.

Lexicographic ordering is a total order on vectors.

Consider (3,2,4) vs. (4,1,4) in the two approaches.

Vector Timestamps Example

CSCE 668Set 12: Causality

16

(1,0,0)

(1,2,0) (1,3,1) (1,4,1)

(0,0,1) (0,0,2) (1,4,3)

(2,0,0)

(0,1,0)

V(g) = (0,0,1) and V(b) = (2,0,0), which are incomparable.Compare with logical clocks L(g) = 1 and L(b) = 2.

Correctness of Vector Timestamps

CSCE 668Set 12: Causality

17

Theorem (6.5 & 6.6): Vector timestamps implement vector clocks.

Proof: First, show a b implies V(a) < V(b).

Case 1: a and b both occur at pi, with a first. Since Vi increases at each step, V(a) < V(b).

Correctness of Vector Timestamps

CSCE 668Set 12: Causality

18

Case 2: a occurs at pi and causes m to be sent, while b occurs at pj and includes the receipt of m. During b, pj updates its vector timestamp in such a

way that V(a) ≤ V(b). pi's estimate of number of steps taken by pj is

never an over-estimate. Since m is not received before it is sent, pi 's estimate of the number of steps taken by pj when a occurs is less than the number of steps taken by pj when b occurs. So V(a)[j] < V(b)[j].

Thus V(a) < V(b).

Correctness of Vector Timestamps

CSCE 668Set 12: Causality

19

Case 3: There exists c such that a c and c b.By induction (from Cases 1 and 2) and transitivity of the relation <, V(a) < V(b).

Next show V(a) < V(b) implies a b.Equivalent to showing !(a b) implies !

(V(a) < V(b)).

Correctness of Vector Timestamps

CSCE 668Set 12: Causality

20

Suppose a occurs at pi, b occurs at pj, and a does not happen before b.

Let V(a)[i] = k. Since a does not happen before b, there

is no chain of messages from pi to pj originating at pi's k-th step or later and ending at pj before b.

Thus V(b)[i] < k. Thus !(V(a) < V(b)).

Size of Vector Timestamps

CSCE 668Set 12: Causality

21

Vector timestamps are big: n components in each one values in the components grow without

bound Is there a more efficient way to

implement vector clocks? Answer is NO, at least under some

conditions.

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

22

Theorem (6.9): Any implementation of vector clocks using vectors of real numbers requires vectors of length n (number of processors).

Proof: For any value of n, consider this execution:

Example Bad Execution

CSCE 668Set 12: Causality

23

For n = 4:

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

24

Claim 1: ai+1 || bi for all i (with wrap-around)

Proof: Since each proc. does all sends before any receives, there is no transitivity. Also pi+1 does not send to pi.

Claim 2: ai+1 bj for all j ≠ i.

Proof: If j = i+1, obvious.If j ≠ i+1, then pi+1 sends to pj:

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

25

Suppose in contradiction, there is a way to implement vector clocks with k-vectors of reals, where k < n.

By Claim 1, ai+1 || bi

=> V(ai+1) and V(bi) are incomparable

=> V(ai+1) is larger than V(bi) in some coordinate h(i)=> h : {0,…,n-1} {0,…,k-1}

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

26

Since k < n, the function h is not 1-1. So there exist distinct i and j such that h(i) = h(j). Let r be this common value of h.

V(a0)V(a1)…V(ai+1)…V(aj+1)…V(an-1)

V(b0)…V(bi)…V(bj)…V(bn-2)V(bn-1)

> in comp. h(0)

> in comp. h(i)

> in comp. h(j)

> in comp. h(n-2)

> in comp. h(n-1)

two of thesecomponents arethe same, sayh(i) = h(j) = r

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

27

V(ai+1)

V(aj+1)

V(bi)

V(bj)

> in component r

> in component r

≤ in all components,

since ai+1 b

j

> in co

mpo

nent

r,

cont

radic

ts a j+1

b i

Vector Clock Size Lower Bound

CSCE 668Set 12: Causality

28

So V(ai+1) is larger than V(bi) in coordinate r and

V(aj+1) is larger than V(bj) in coordinate r also.

V(aj+1)[r] > V(bj)[r] by def. of r

≥ V(ai+1)[r] by Claim 2 (ai+1 bj) & correct.

≥ V(bi)[r] by def. of r

Thus V(aj+1) !< V(bi), contradicting Claim 2 (aj+1 bi) and assumed correctness of V.

Application of Causality: Consistent Cuts

CSCE 668Set 12: Causality

29

Consider an asynchronous message passing system with FIFO message delivery per channel at most one msg received per computation

step Number the computation steps of each

processor 1,2,3,… A cut of an execution is K = (k0,…,kn-1),

where ki indicates number of computation steps taken by pi

Consistent Cuts

CSCE 668Set 12: Causality

30

In a consistent cut K = (k0,…,kn-1), if step s of pj

happens before step ki of pi, then s ≤ kj.(1,3) and (2,4) are consistent.

(3,6) is inconsistent: step 4 by p0 happens before step 6 of p1, but 4 is greater than 3.

some cuts

Finding a Recent Consistent Cut

CSCE 668Set 12: Causality

31

Problem Version 1: Processors all given a cut K and must find a maximal consistent cut that is ≤ K.

Application: Logging-based crash recovery. Procs periodically write their state to stable

storage When a proc recovers from a crash, it tries to

recover to latest logged state, but needs to coordinate with other procs

Vector Clocks Solution

CSCE 668Set 12: Causality

32

Implement vector clocks using vector timestamps appended to application msgs.

Store the vector clock of each computation step in a local array store[1,…]

When pi is given input cut K:

for x := K[i] downto 1 do if store[x] ≤ K then return xreturn x (entry for pi of global answer)

What About Channel State?

CSCE 668Set 12: Causality

33

Processor states are not sufficient to capture entire system state.

Messages in transit must be calculated. Solution here requires

additional storage (number of messages) additional computation at recovery time

(involving replaying original execution to capture messages sent but not received)

Another Take on Recent Consistent State

CSCE 668Set 12: Causality

34

Problem Version 2: A subset of procs initiate (at arbitrary times) trying to find a consistent cut that includes the state of at least one of the initiators when it started.

Called a distributed snapshot. Snapshot info can be collected at one

proc. and then analyzed.Application: termination detection

Marker Algorithm

CSCE 668Set 12: Causality

35 Instead of adding extra information on

each application message, insert control messages ("markers") into the channels.

Code for pi:initially answer = -1 and num = 0when application msg arrives:

num++; do application actionwhen marker arrives or when initiating snapshot:

if answer = -1 then

answer := num // pi's part of final answer

send marker to all neighbors

What About Channel States?

CSCE 668Set 12: Causality

36

pi records sequence of msgs received from pj between the time pi records its answer and the time pi gets the marker from pj

These are the msgs in transit from pj to pi in the cut returned by the algorithm.