chapter 10 time and global states

77
Chapter 10 Time and Global States Clocks and Synchronization Algorithms Lamport Timestamps and Vector Clocks Distributed Snapshots and Termination Network

Upload: risa-yang

Post on 30-Dec-2015

24 views

Category:

Documents


3 download

DESCRIPTION

Chapter 10 Time and Global States. Clocks and Synchronization Algorithms Lamport Timestamps and Vector Clocks Distributed Snapshots and Termination. What Do We Mean By Time?. Monotonic increasing Useful when everyone agrees on it UTC is Universal Coordinated Time. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 10  Time and Global States

Chapter 10 Time and Global States

•Clocks and Synchronization Algorithms

•Lamport Timestamps and Vector Clocks

•Distributed Snapshots and Termination

Network

Page 2: Chapter 10  Time and Global States

2

What Do We Mean By Time?

• Monotonic increasing

• Useful when everyone agrees on it

• UTC is Universal Coordinated Time.

• NIST operates on a short wave radio frequency WWV and transmits UTC from Colorado.

Page 3: Chapter 10  Time and Global States

3

Clock Synchronization

When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.

Page 4: Chapter 10  Time and Global States

4

Time

• Time is complicated in a distributed system.

• Physical clocks run at slightly different rates – so they can ‘drift’ apart.

• Clock makers specify a maximum drift rate (rho).

• By definition

1- <= dC/dt <= 1+ where C(t) is the clock’s time as a function of the real time.

Page 5: Chapter 10  Time and Global States

5

Clock Synchronization

The relation between clock time and UTC when clocks tick at different rates.

Page 6: Chapter 10  Time and Global States

6

Clock Synchronization

• 1- <= dC/dt <= 1+• A perfect clock has dC/dt = 1

• Assuming 2 clocks have the same max drift rate . To keep them synchronized to within a time interval delta, , they must re-sync every /2 seconds.

Page 7: Chapter 10  Time and Global States

7

Cristian’s Algorithm

• One of the nodes (or processors) in the distributed system is a time server TS (presumably with access to UTC). How can the other nodes be sync’ed?

• Periodically, at least every /2 seconds, each machine sends a message to the TS asking for the current time and the TS responds.

Page 8: Chapter 10  Time and Global States

8

Cristian's Algorithm

Getting the current time from a time server.

Page 9: Chapter 10  Time and Global States

9

Cristian’s Algorithm

• Should the client node simply force his clock to the value in the message??

• Potential problem: if client’s clock was fast, new time may be less than his current time, and just setting the clock to the new time might make time appear to run backwards on that node.

• TIME MUST NEVER RUN BACKWARDS. There are many applications that depend on the fact that time is always increasing. So new time must be worked in gradually.

Page 10: Chapter 10  Time and Global States

10

Cristian’s Algorithm

• Can we compensate for the delay from when TS sends the response to T1 (when it is received)?

• Add (T1 – T0)/2. If no outside info is available.• Estimate or ask server how long it takes to process

time request, say R. Then add (T1 – T0 – R)/2.• Take several measurements and taking the smallest or

an average after throwing out the large values.

Page 11: Chapter 10  Time and Global States

11

The Berkeley Algorithm

• The server actively tries to sync the clocks of a DS. This algorithm is appropriate if no one has UTC and all must agree on the time.

• Server “polls” each machine by sending his current time and asking for the difference between his and theirs. Each site responds with the difference.

• Server computes ‘average’ with some compensation for transmission time.

• Server computes how each machine would need to adjust his clock and sends each machine instructions.

Page 12: Chapter 10  Time and Global States

12

The Berkeley Algorithm

a) The time daemon asks all the other machines for their clock values

b) The machines answer

c) The time daemon tells everyone how to adjust their clock

Page 13: Chapter 10  Time and Global States

13

Analysis of Sync Algorithms

• Cristian’s algorithm: N clients send and receive a message every /2 seconds.

• Berkeley algorithm: 3N messages every /2 seconds.

• Both assume a central time server or coordinator. More distributed algorithms exist in which each processor broadcasts its time at an agreed upon time interval and processors go through an agreement protocol to average the value and agree on it.

Page 14: Chapter 10  Time and Global States

14

Analysis of Sync Algorithms

• In general, algorithms with no coordinator have greater message complexity (more messages for the same number of nodes). That’s the price you pay for equality and no-single-point-of-failure.

• With modern hardware, we can achieve “loosely synchronized” clocks. This forms the basis for many distributed algorithms in which logical clocks are used with physical clock timestamps to disambiguate when logical clocks roll over or servers crash and sequence numbers start over (which is inevitable in real implementations).

Page 15: Chapter 10  Time and Global States

15

Logical Clocks

• What do we really need in a “clock”? For many applications, it is not necessary for nodes of a DS to agree on the real time, only that they agree on some value that has the attributes of time.

• Attributes of time: X(t) has the sense or attributes of time if it is strictly increasing.

• A real or integer counter can be used. A real number would be closer to reality, however, an integer counter is easier for algorithms and programmers. Thus, for convenience, we use an integer which is incremented anytime an event of possible interest occurs.

Page 16: Chapter 10  Time and Global States

16

Logical Clocks in a DS

• What is important is usually not when things happened but in what order they happened so the integer counter works well in a centralized system.

• However, in a DS, each system has its own logical clock, and you can run into problems if one “clock” gets ahead of others. (like with physical clocks)

• We need a rule to synchronize the logical clocks.

Page 17: Chapter 10  Time and Global States

17

Lamport Clocks

Lamport defined the happens-before relation for DS.

A B means “A happens before B”.• If A and B are events in the same process and A occurs

before B then A B is true.• If A is the event of a message being sent by one

process-node and B is the event of that message being received by another process, then then A B is true. (A message must be sent before it is received).

• Happens-before is the transitive closure of 1 and 2. That is, if AB and BC, then AC.

Any other events are said to be concurrent.

Page 18: Chapter 10  Time and Global States

18

Events at Three Processes

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

ab and ac and bf but b and e are incomparable. bf and ef Does e b?

Page 19: Chapter 10  Time and Global States

19

Lamport Clocks

• Desired properties:

• (1) anytime A B , C(A) < C(B), that is the logical clock value of the earlier event is less

• (2) the clock value C is increasing (never runs backwards)

Page 20: Chapter 10  Time and Global States

20

Lamport Clocks Rules

• An event is an internal event or a message send or receive.

• The local clock is increased by one for each message sent and the message carries that timestamp with it.

• The local clock is increased for an internal event.• When a message is received, the current local clock

value, C, is compared to the message timestamp, T. If the message timestamp, T = C, then set the local clock value to C+1. If T > C, set the clock to T+1. If T<C, set the clock to C+1.

Page 21: Chapter 10  Time and Global States

21

Lamport Clocks

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

Anytime A B , C(A) < C(B) However, C(A) < C(B) doesn’t mean A B (ex: C(e) < C(b) but it is not true that e b)

Page 22: Chapter 10  Time and Global States

22

Total Order Lamport Clocks

• If you need a total ordering, (distinguish between event 3 on P2 and event 1 on P3) use Lamport timestamps.

• Lamport timestamp of event A at node i is (C(A), i) • For any 2 timestamps T1=(C(A),I) and T2=(C(B),J)

– If C(A) > C(B) then T1 > T2.

– If C(A) < C(B) then T1 < T2.

– If C(A) = C(B) then consider node numbers. If I>J then T1 > T2. If I<J then T1 < T2. If I=J then the two events occurred at the same node, so since their clock C is the same, they must be the same event.

Page 23: Chapter 10  Time and Global States

23

Total Order Lamport Timestamps

The order will be (1,1), (1,3), (2,1), (3,2) etc

(1,1) (2,1)

(3,2)

(4,2)

(1,3) (5,3)

Page 24: Chapter 10  Time and Global States

24

Why Total Order?

Database updates need to be performed in the same order at all sites of a replicated database.

Page 25: Chapter 10  Time and Global States

25

Exercise: Lamport Clocks

Assuming the only events are message send and receive, what are the clock values at events a-g?

A

B

C

a b c

d e

f g

Page 26: Chapter 10  Time and Global States

26

Limitation of Lamport Clocks

• Total order Lamport clocks gives us the property if A B then C(A) < C(B). But it doesn’t give us the property if C(A) < C(B) then AB. (if C(A) < C(B), A and B may be concurrent or incomparable, but never BA).

A1

B2

C3

2,1 5,1 1,2 2,2

3,3 4,3

Lamport timestamp of 2,1 < 3,3 but the events are unrelated

Page 27: Chapter 10  Time and Global States

27

Limitation

Also, Lamport timestamps do not detect causality violations. Causality violations are caused by long communications delays in one channel that are not present in other channels or a non-FIFO channel.

A

B

C

A and C will never know messages were

out of order

Page 28: Chapter 10  Time and Global States

28

Causality Violation• Causality violation example: A gets a message from

B that was sent to all nodes. A responds by sending an answer to all nodes. C gets A’s answer to B before it receives B’s original message.

• How can B tell that this message is out of order?– Assume one send event for a set of messages

A

B

C

Page 29: Chapter 10  Time and Global States

29

Causality: Solution

• The solution is vector timestamps: Each node maintains an array of counters.

• If there are N nodes, the array has N integers V(N). V(I) = C, the local clock, if I is the designation of the local node.

• In general, V(X) is the latest info the node has on what X’s local clock is.

• Gives us the property e f iff ts(e) < ts(f)

Page 30: Chapter 10  Time and Global States

30

Vector Timestamps

Each site has a local clock incremented at each event (not according to Lamport clocks) The vector clock timestamp is piggybacked on each message sent. RULES:

• Local clock is incremented for a local event and for a send event. The message carries the vector time stamp.

• When a message is received, the local clock is incremented by one. Each other component of the vector is increased to the received vector timestamp component if the current value is less. That is, the maximum of the two components is the new value.

Page 31: Chapter 10  Time and Global States

31

Vector Timestamps and Causal Violations

• C receives message (2,1,0) then (0,1,0)• The later message causally precedes the first

message if we define how to compare timestamps right

A

B

C

Page 32: Chapter 10  Time and Global States

32

Vector Clock Comparison

• VC1 > VC2 if for each component j, VC1[j] >= VC2[j], and for some component k, VC1[k] > VC2[k]

• VC1 = VC2 if for each j, VC1[j] = VC2[j]• Otherwise, VC1 and VC2 are incomparable

and the events they represent are concurrent

A

B

C

1

2

3 4

Clock at point1= (2,1,0)2= (2,2,0)3= (2,1,1)4= (2,1,2)

Page 33: Chapter 10  Time and Global States

33

Vector Clocks

a b

c d

e f

m1

m2

(2,0,0)(1,0,0)

(2,1,0) (2,2,0)

(2,2,2)(0,0,1)

p1

p2

p3

Physical time

Page 34: Chapter 10  Time and Global States

34

Vector Clock Exercise

Assuming the only events are send and receive:

What is the vector clock at events a-f?

Which events are concurrent?

A

B

C

a

b e

c d

f

Page 35: Chapter 10  Time and Global States

35

Matrix Timestamps

Matrix timestamps can be used to give each node more information about the state of the other nodes.

Each site keeps a 2 dimensional time table

If Ti[j,k] = v then site i knows that site j is aware of all events at site k up to v

Row x is the view of the vector clock at site x A’s TT A B CA 3 2 3B 1 2 0C 2 2 3

A

B

C

Page 36: Chapter 10  Time and Global States

36

Matrix Timestamp Example

Node A in previous slide has table

Node A receives message from C with timestamp

To get A’s new time table:• compare each row in tables component-wise

and take the maximum • update A’s row by taking the max of each

column

3 0 00 0 00 0 0

2 0 01 2 02 2 3

3 2 31 2 02 2 3

Page 37: Chapter 10  Time and Global States

37

Global State

• Matrix timestamps is one way of getting information about the distributed system. Another way is to sample the global state.

• The Global state is the combination of the states of all the processors and channels at some time which could have occurred. – Because there is no way of recording states at the

exact same time at every node, we will have to be careful how we define this.

Page 38: Chapter 10  Time and Global States

38

Global State

• There are many reasons for wanting to sample the global state “take a snapshot”.– deadlock detection – finding lost token – termination of a distributed computation – garbage collection

• We must define what is meant by the state of a node or a channel.

Page 39: Chapter 10  Time and Global States

39

Defining Global State

• There are N processes P1…Pn. The state of the process Pi is defined by the system and application being used.

• Between each pair of processors, Pi and Pj, there is a one-way communications channel Ci,j. Channels are reliable and FIFO, ie, the messages arrive in the order sent. The contents of Ci,j is an ordered list of messages Li,j = (m1, m2, m3…). The state of the channel is the messages in the channel and their order.

• Li,j = (m1, m2, …) is the channel from Pi to Pj and m1 (head or front) is the next message to be delivered.

Page 40: Chapter 10  Time and Global States

40

Defining Global State

It is not necessary for all processors to be interconnected, but each processor must have at least one incoming channel and one outgoing channel and it must be possible to reach each processor from any other processor (graph is strongly connected).

1

2

3

4

Page 41: Chapter 10  Time and Global States

41

Defining Global State

• The Global state is the combination of the states of all the processors and channels.

• The state of all the channels, L, is the set of messages sent but not yet received.

• Defining the state was easy, getting the state is more difficult.

• Intuitively, we say that a consistent global state is a “snapshot” of the DS that looks to the processes as if it were taken at the same instant everywhere.

Page 42: Chapter 10  Time and Global States

42

Defining Global State

• For a global state to be meaningful, it must be one that could have occurred.

• Suppose we observe processor Pi (getting state Si) and it has just received a message m from processor Pk. When we observe processor Pk to get Sk, it should have sent m to Pi in order for us to have a consistent global state. In other words, if we get Pk’s state before it sent message m and then get Pi’s state after it received m, we have an inconsistent global state.Pi

Pk

Pi

Pk

Page 43: Chapter 10  Time and Global States

43

Consistent Cut

• So we say that the global state must represent a consistent cut.

• One way of defining a consistent cut is that the observations resulting in the states Si should all occur concurrently (as defined using vector clocks).

• Also, a consistent cut is one where all the events before the cut happen-before the ones after the cut or are unrelated (uses “happens-before” relation).

Page 44: Chapter 10  Time and Global States

44

Global State

a) A consistent cutb) An inconsistent cut

Page 45: Chapter 10  Time and Global States

45

More Cuts

m1 m2

p1

p2Physical

time

e10

Consistent cutInconsistent cut

e11

e12

e13

e20

e 21

e 22

Page 46: Chapter 10  Time and Global States

46

Vector Clocks and Cuts

All events before a consistent cut happen before (or are concurrent with) all events after the cut

m1 m2

p1

p2Physical

time

Cut C1

(1,0) (2,0) (4,3)

(2,1) (2,2) (2,3)

(3,0)x1= 1 x1= 100 x1= 105

x2= 100 x2= 95 x2= 90

x1= 90

Cut C2

Page 47: Chapter 10  Time and Global States

47

Distributed Snapshot Algorithms

• Snapshot algorithms are used to record a consistant state of the DS.

• Snapshots can be used to detect stable states.

• Once the system enters a stable state, it will remain in that state (until there is some outside intervention).

• Examples of stable states: lost token, deadlock, termination.

Page 48: Chapter 10  Time and Global States

48

Algorithm for Distributed Snapshot

• Well known algorithm by Chandy and Lamport

• Assumes:– Communication channels are reliable,

unidirectional and FIFO– There are no failures– The graph of processes is strongly connected.

Page 49: Chapter 10  Time and Global States

49

Chandy and Lamport

• When instructed, each processor will stop other processing and record its state Pi, send out marker messages and record the sequence of messages arriving on each incoming channel until a marker comes in (this will enable us to get the channel state Ci,j).

• At end of algorithm, initiator or other coordinator collects local states and compiles global state.

Page 50: Chapter 10  Time and Global States

50

Chandy Lamport Snapshot

a) Organization of a process and channels for a distributed snapshot

Page 51: Chapter 10  Time and Global States

51

Chandy Lamport Snapshot

• One processor starts the snapshot by recording his own local state and immediately sends a marker message M on each of its outgoing channels. (This indicates the causal boundary between before the local state was recorded and after). It begins to record all the messages arriving on all incoming channels. When it has received markers from all incoming channels, it is done.

• When a processor who was not the initiator receives the marker for the first time, it immediately records its local state, sends out markers on all outgoing channels. It begins recording the received message sequence on all incoming channels other than the one it just received the marker on. When a marker has been received on each incoming channel that is being recorded, the processor is done with its part of the snapshot.

Page 52: Chapter 10  Time and Global States

52

Chandy Lamport Snapshot

b) Process Q receives a marker for the first time and records its local state

c) Q records all incoming messaged) Q receives a marker for its incoming channel and finishes recording

the state of the incoming channel

Page 53: Chapter 10  Time and Global States

53

Snapshot

Node 2 initiates snapshot

1

2

3

4

a

bc

Recorded:

2 State S2

M

M

Page 54: Chapter 10  Time and Global States

54

Snapshot

Node 2 initiates snapshot

1

2

3

4b

c

Recorded:

2 State S2M

M

Page 55: Chapter 10  Time and Global States

55

Snapshot

Node 2 initiates snapshot

1

2

3

4b

Recorded:

2 State S2M

MM d

14

State S1

State S44

Page 56: Chapter 10  Time and Global States

56

Snapshot

1

2

3

4

Recorded:

2 State S2M

M

M

d1

4

State S1

State S4

L3,2 = b

4

Page 57: Chapter 10  Time and Global States

57

Snapshot

1

2

3

4

Recorded:

2 State S2

M

d1

4

State S1

State S4

L3,2 = bL1,2 = emptyL4,2 = empty

4

Page 58: Chapter 10  Time and Global States

58

Snapshot

1

2

3

4

Recorded:

2 State S2

14

State S1

State S4

L3,2 = bL1,2 = emptyL4,2 = empty

3

State S3

M

L3,1 = d

M

4

3

Page 59: Chapter 10  Time and Global States

59

Snapshot

1

2

3

4

Recorded:

2 State S2

14

State S1

State S4

L3,2 = bL1,2 = emptyL4,2 = empty

3

State S3

M

L3,1 = d

M 4

3

Page 60: Chapter 10  Time and Global States

60

Snapshot

1

2

3

4

Recorded:

2 State S2

14

State S1

State S4

L3,2 = bL1,2 = emptyL4,2 = empty

3

State S3

L3,1 = d

4

3

1

2

Page 61: Chapter 10  Time and Global States

61

Chandy Lamport Snapshot

Uses O(|E|) messages where E is the number of edges. Time bound is dependent on the topology of the graph.

1

2

3

4

2

14

3

4

3

1

2

Page 62: Chapter 10  Time and Global States

62

Chandy Lamport is a Consistent Cut

a

c

M

1

2

3

4

M

M M d

bM

M

M

Page 63: Chapter 10  Time and Global States

63

Chandy Lamport is a Consistent Cut

a

c

1

2

3

4

d

b

Page 64: Chapter 10  Time and Global States

64

Termination detection

Page 65: Chapter 10  Time and Global States

65

Termination Detection

• Problem: Determine if a distributed computation has terminated. This is difficult because while some nodes look like they are done, a message from a node not yet queried could awaken them to more computations.

• Nodes can be organized in a ring – either physically or logically.

• Communications are reliable and FIFO.

Page 66: Chapter 10  Time and Global States

66

Termination Detection

• Each node can be either in active or in passive state. • Only an active node can send messages to other

nodes; each message sent is received after some period of time.

• After having received a message, a passive node becomes active; the receipt of a message is the only mechanism that triggers a passive node to become active.

• For each node, the transition from the active to the passive state may occur "spontaneously".

Page 67: Chapter 10  Time and Global States

67

Termination Detection

• The state in which all nodes are passive and no messages are on their way is stable: the distributed computation is said to have terminated.

• The purpose of the algorithm is to enable one of the nodes, say node 0, to detect that this stable state has been reached.

0

1 2

344

1

300

2

Page 68: Chapter 10  Time and Global States

68

Termination Detection

The problem is that a node may say it is finished, but then an incoming message “wakes it up” and it begins

processing and perhaps sending out more messages waking up more processes. We cannot query them “all at once” and even if we could, we might miss a

message in transition.

0 3

Yes, I’m done Are you done? Here’s more

work

3

Page 69: Chapter 10  Time and Global States

69

Dijkstra’s TD Algorithm

Every node maintains a counter c. Sending a message increases c by one; the receipt of a message decreases c by one. The sum of all counters thus equals the number of messages pending in the network.

0 3

11

Page 70: Chapter 10  Time and Global States

70

Dijkstra’s TD Algorithm

When node 0 initiates a detection probe, it sends a token with a value 0 to node N-1. Every node i keeps the token until it becomes passive; it then forwards the token to node i-1 increasing the token value by c (the message count).

0 4

10

0

Page 71: Chapter 10  Time and Global States

71

Dijkstra’s TD Algorithm

Every node and also the token has a color (initially all white). When a node receives a message, the node turns black. When a node forwards the token, the node turns white. If a black machine forwards the token, the token turns black; otherwise the token keeps its color.

0

10

0 44

0

Page 72: Chapter 10  Time and Global States

72

Dijkstra’s TD Algorithm

When node 0 receives the token again, it can conclude termination, if node 0 is passive and white, the token is white, and the sum of the token value and c is 0.

Otherwise, node 0 may start a new probe.

0

-1

1To node N-1From node 1

Page 73: Chapter 10  Time and Global States

73

1

0

2

0

0

1

0-1

2

Page 74: Chapter 10  Time and Global States

74

1

-1

2

0

0

1

0-1

20

Page 75: Chapter 10  Time and Global States

75

1

-1

2

0

1

20

00

0

Page 76: Chapter 10  Time and Global States

76

1

-1

2

0

1

20

1

0

Page 77: Chapter 10  Time and Global States

77

1

-10

1

20

1

0

12