distributed systems fall 2009 logical time, global states, and debugging

33

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Systems Fall 2009 Logical time, global states, and debugging
Page 2: Distributed Systems Fall 2009 Logical time, global states, and debugging

Distributed Systems Fall 2009

Logical time, global states,

and debugging

Page 3: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 3

Outline

• Logical time– Lamport timestamps– Vector clocks

• Global states– Snapshot algorithm

• Distributed debugging• Summary

Page 4: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 4

Logical time

• No global time! What do we do? What can we trust?– Causal relationships still hold: effect cannot be observed before the cause

• Processes maintain an ordering of events: e →i e'

• Send events must come before Receive events

Page 5: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 5

Happened-before relation “→”

• HB1: If there exists a process pi: e →i e', then e → e'

• HB2: For any message m: send(m) → receive(m)

• HB3: If e, e', and e'' are events s.t. e → e' and e'' → e''', then e → e'''

Page 6: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 6

Example (HB-relation)

• HB1: a → b, c → d, e → f• HB2: b → c, d → f• HB3: a → b → c → d → f

• No ordering for e.g. b and e! They are concurrent, denoted b || e

Page 7: Distributed Systems Fall 2009 Logical time, global states, and debugging

HB-relation pitfall

• Messages are not always sent as result of a causal relationship (e.g. scheduled messages)

Page 8: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 8

Lamport logical clocks

• Monotonically increasing• Each process has a counter which is increased when an event occurs

• Counter is sent with messages– Recipient sets own clock to max(own, received) and then increases own counter

Page 9: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 9

Example (Lamport timestamps)

• Evident that e → e' ⇒ L(e) < L(e')

• But, the opposite does not hold!– E.g. L(b) > L(e), but b || e

Page 10: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 10

Vector clocks

• Keep track of known events at all processes (a vector of timestamps)

• Send vector clock with message– Receiver merges clocks by setting own values to the maximum of own values and received ones

Page 11: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 11

Example (Vector clocks)

• Vector clocks can be ordered– V=V' if all values are the same– V≤V' if all values in V are ≤ those in V'

– V<V' if V≤V' and V and V' are non-equal

Page 12: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 12

Vector clocks

• e → e' ⇒ V(e) < V(e')• and• V(e) < V(e') ⇒ e → e'

• Concurrent events (b || e):– Neither V(b) < V(e) nor V(e) < V(b)

• Nice properties, but expensive in terms of memory and bandwidth– O(N) in both cases

Page 13: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 13

Global states

• We often need to know the state of the entire distributed system– Distributed garbage collection– Distributed deadlocks– Distributed termination detection

• Just process states are not enough!– Messages currently in the channel

Page 14: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 14

Global states

• Simple with global time!– Just issue “report state at time X”– …we do not have this luxury

• Each process maintains own history– We could create global history by just taking union of all local histories

• We only want to consider such global states S that may have occurred at the same time

Page 15: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 15

Cuts

• Denote the local prefix of the process history (first k events) by hi

k = <ei0, ei

1, ..., eik>

• A cut is a union of prefixes of process histories:C = h1

c1 ∪ h2c2 ∪ … ∪ hN

cN

• The state of pi in S is that which

it had after immediately the last of the process' event that is in the cut

Page 16: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 16

Consistent cuts and global states

• According to the definition, we can make any cut we want, including ones that make no sense!

• We want to only consider consistent cuts– For all events e ∈ C, f → e ⇒ f ∈ C

• Consistent global states correspond to consistent global cuts– We only ever move between consistent global states during execution

Page 17: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 17

Linearizations and runs

• Both are total orderings of all events in the global history

• A run is only consistent with the ordering of each process' own local history

• A linearization is consistent with the (global) happened-before relation

Page 18: Distributed Systems Fall 2009 Logical time, global states, and debugging

Linearizations and runs

• Runs do not have to pass through consistent global states, but all linearizations do– S' is reachable from S if ∃ a linearization from S to S'

Page 19: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 19

Snapshot algorithm

• Chandy and Lamport• Distributed algorithm• Constructs a snapshot of the global state (both process and channel state)– Ensures that the global state is consistent

– Makes no guarantee that the system was actually in the recorded state!

Page 20: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 20

Snapshot algorithm – assumptions

• Neither channel nor processes fail• Every sent message is eventually delivered intact

• Strongly connected graph of processes• Channels are unidirectional and provide FIFO message delivery

Page 21: Distributed Systems Fall 2009 Logical time, global states, and debugging

Snapshot algorithm – notes

• Any process may initiate a global snapshot at any time

• Processes may continue execution while the snapshot is in progress (important)

Page 22: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 22

Snapshot algorithm

• When a marker is received on channel c– If own process state is not recorded• Record own process state• Send marker messages on all outbound channels before sending anything else

• Start recording the set of messages coming in on all other incoming channels (the set for c is the empty set)

Page 23: Distributed Systems Fall 2009 Logical time, global states, and debugging

Snapshot algorithm

– If own process state is already recorded• Record the state of c as the messages that have been received on it since recording own state

Page 24: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 24

Snapshot algorithm

• Study the example in the book!• The example takes a few minutes to walk through and understand– If you want guidance, we can do it after class

Page 25: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 25

Distributed debugging

• Different from regular debugging!• Let system run, and establish post-hoc (after the fact) whether the system worked as intended

• Rather than Snapshot, we shall study a centralized algorithm– Monitoring server not considered part of the group of processes in the system

Page 26: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 26

Distributed debugging

• Stable and non-stable predicates

• Safety with respect to α– α evaluates to False for all states S reachable from the initial state

• Liveness with respect to β– For any linearization L starting in the initial state, β evaluates to True for some state reachable from the initial state

Page 27: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 27

Possibly Φ

• Possibly Φ: there is a consistent global state S through which a linearization of H passes such that Φ(S) is True– “At some point, Φ is true”

Page 28: Distributed Systems Fall 2009 Logical time, global states, and debugging

Definitely Φ

• Definitely Φ: for all linearizations L of H, there is a consistent global state S through which L passes such that Φ(S) is True– “No matter which way the system is executed, Φ is always true”

Page 29: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 29

Possibly and Definitely Φ

• Processes send their state to the monitor as soon as it changes (including own vector clock)

• State is consistent if:– px's events known at py when it sent

sy is no more than the events that

had occurred at px when it sent sx

• “Events that happened-before are already known and part of the global state”

Page 30: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 30

Possibly and Definitely Φ

• The vector clocks allow us to construct a lattice (crystalline structure) of global states for an execution– Determining Possibly and Definitely Φ is simply a manner of traversing this structure

• Figures 11.14 and 11.15 from the book, © Pearson Education 2005

Page 31: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 31

Summary

• Logical clocks are based on events in processes and the inter-event relationships– Happened-before relation

• Logical clocks do not capture everything– Out-of-band communication (phone calls, etc.)

Page 32: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 32

Summary

• Lamport's logical clocks are simple, but give limited reasoning power

• Vector clocks are more powerful, but also more costly

• Global state encompasses both process and channel state– Snapshot algorithm

• Distributed debugging is quite different from regular debugging– Possibly Φ and Definitely Φ

Page 33: Distributed Systems Fall 2009 Logical time, global states, and debugging

Fall 2009 5DV020 33

Next lecture

• Coordination and Agreement– Distributed mutual exclusion– Election algorithms– Byzantine Generals problem

• Multicast and message ordering– Important for the assignment!