global predicate detection and event ordering
DESCRIPTION
Global Predicate Detection and Event Ordering. Our Problem. To compute predicates over the state of a distributed application. Model. Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time - PowerPoint PPT PresentationTRANSCRIPT
Global Predicate Detection
and Event Ordering
Our Problem
To compute predicatesover the state of
a distributed application
Model
Message passing
No failures
Two possible timing assumptions:
Synchronous System
Asynchronous System
No upper bound on message delivery time
No bound on relative process speeds
No centralized clock
Asynchronous systems
Weakest possible assumptions
Weak assumptions ´ less vulnerabilities
Asynchronous slow
“Interesting” model w.r.t. failures
Client-Server
•Processes exchange messages using •Remote Procedure Call (RPC)
A client requests a service by sending the server a
message. The client blocks while waiting for a response
sc
Client-Server
• Processes exchange messages using • Remote Procedure Call (RPC)
The server computes the response (possibly asking
other servers) and returns it to the client
A client requests a service by sending the server a
message. The client blocks while waiting for a response
s
#!?%!
c
Deadlock!
Goal
•Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds
Draw arrow from pi to pj if pj has received a request but has not responded yet
Wait-For Graphs
Draw arrow from pi to pj if pj has received a request but has not responded yet
Cycle in WFG ) deadlock
Deadlock ) ¦ cycle in WFG
Wait-For Graphs
The protocol
p0 sends a message to p1 p3
On receipt of p0 ‘s message, pi replies with its state and wait-for info
An execution
An execution
An execution
Ghost Deadlock!
We have a problem...
Asynchronous system
no centralized clock, etc. etc.
Synchrony useful to
coordinate actions
order events
Events and HistoriesProcesses execute sequences of events
Events can be of 3 types: local, send, and receive
epi is the i-th event of process p
The local history hp of process p is the sequence of events executed by process
hpk : prefix that contains first k events
hp0 : initial, empty sequence
The history H is the set hp0 [ hp1
[ … hpn-1NOTE: In H, local histories are interpreted as sets, rather than sequences, of events
Ordering events
Observation 1:
Events in a local history are totally ordered
•
time
Ordering events
Observation 1:
Events in a local history are totally ordered
Observation 2:
For every message m, send(m) precedes receive(m)
time
time
time
Happened-before(Lamport[1978])
•A binary relation defined over events
1. if eik,ei
l2hi and k<l , then eik !ei
l
2. if ei=send (m) and ej=receive(m), then ei! ej
3. if e!e’ and e’! e‘’, then e! e‘’
Space-Time diagrams
•A graphic representation of a distributed execution
time
Space-Time diagrams
•A graphic representation of a distributed execution
time
Space-Time diagrams
•A graphic representation of a distributed execution
time
Space-Time diagrams
•A graphic representation of a distributed execution
time
Space-Time diagrams
•A graphic representation of a distributed execution
time
H and impose a partial order
Space-Time diagrams
•A graphic representation of a distributed execution
time
H and impose a partial order
Space-Time diagrams
•A graphic representation of a distributed execution
time
H and impose a partial order
Space-Time diagrams
•A graphic representation of a distributed execution
time
H and impose a partial order
Runs andConsistent Runs
A run is a total ordering of the events in H that is consistent with the local histories of the processors
Ex: h1,h2, … ,hn is a run
A run is consistent if the total order imposed in the run is an extension of the partial order induced by!
A single distributed computation may correspond to several consistent runs!
Cuts
•A cut C is a subset of the global history of H
•
•A cut C is a subset of the global history of H
•The frontier of C is the set of events
•
Cuts
Global states and cuts
The global state of a distributed computation is an tuple of n local states
= (1 ... n )
To each cut (1c1, ... n
cn) corresponds a global
state
Consistent cuts and consistent global states
A cut is consistent if
A consistent global state is one corresponding to a consistent cut
What sees
What sees
•Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p3 but not the corresponding send event
Our task
Develop a protocol by which a processor can build a consistent global state
Informally, we want to be able to take a snapshot of the computation
Not obvious in an asynchronous system...
Our approach
Develop a simple synchronous protocol
Refine protocol as we relax assumptions
Record:processor stateschannel states
Assumptions:FIFO channelsEach m timestamped with with T(send(m))
Snapshot I
• 1. p0 selects tss
• 2. p0 sends “take a snapshot at tss” to all processes
• 3. when clock of pi reads tss then a. records its local state i
b. starts recording messages received on each of incoming channels
• stops recording a channel when it receives first message with timestamp greater than or equal to tss
Snapshot I
• 1. p0 selects tss
• 2. p0 sends “take a snapshot at tss” to all processes
• 3. when clock of pi reads tss then a. records its local state i
b. sends an empty message along its outgoing channelsc. starts recording messages received on each of
incoming channelsd. stops recording a channel when it receives first
message with timestamp greater than or equal to tss
Correctness
Theorem: Snapshot I produces a consistent cut
< Assumption >
< Assumption >
< 0 and 1>
Proof: Need to prove
< Definition >
< Property of real time>
< 2 and 4>
< 5 and 3>
< Definition >
Clock Condition
< Property of real time>
Can the Clock Condition be implemented some other way?
Lamport Clocks
•Each process maintains a local variable LC•LC(e) = value of LC for event e
Increment Rules
Timestamp m with
Space-Time Diagrams and Logical
Clocks2
1
3
4 5 6
6
7
7
8
8
9
A subtle problem
• when LC=t do S
• doesn’t make sense for Lamport clocks!
• there is no guarantee that LC will ever be t• S is anyway executed after
• Fixes: • If e is internal/send and LC = t-2
• execute e and then S• If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1)
• put message back in channel• re-enable e; set LC=t-1; execute S
An obvious problem
No tss !
Choose large enough that it cannot be reached by applying the update rules of logical clocks
An obvious problem
No tss!
Choose large enough that it cannot be reached by applying the update rules of logical clocks
Doing so assumes
upper bound on message delivery time
upper bound relative process speeds
Better relax it
Snapshot II
p0 selects
p0 sends “take a snapshot at tss” to all processes; it waits for all of them to reply and then sets its logical clock to
when clock of pi reads then pi
records its local state i
sends an empty message along its outgoing channelsstarts recording messages received on each incoming channelstops recording a channel when receives first message with timestamp greater than or equal to
Relaxing synchrony
Process does nothing for the protocol during this time!
take a snapshot at
empty message:
monitorschannels records
local state
sends empty message:
Use empty message to announce snapshot!
Snapshot III
Processor p0 sends itself “take a snapshot “
when pi receives “take a snapshot” for the first time from pj:
records its local state i
sends “take a snapshot” along its outgoing channels
sets channel from pj to empty
starts recording messages received over each of its other incoming channels
when receives “take a snapshot” beyond the first time from pk:
pi stops recording channel from pk
when pi has received “take a snapshot” on all channels, it sends collected state to p0 and stops.
Snapshots: a perspective
The global state s saved by the snapshot protocol is a consistent global state
Snapshots: a perspective
The global state s saved by the snapshot protocol is a consistent global stateBut did it ever occur during the computation?
a distributed computation provides only a partial order of eventsmany total orders (runs) are compatible with that partial orderall we know is that s could have occurred
Snapshots: a perspective
The global state s saved by the snapshot protocol is a consistent global stateBut did it ever occur during the computation?
a distributed computation provides only a partial order of eventsmany total orders (runs) are compatible with that partial orderall we know is that s could have occurred
We are evaluating predicates on states that may have never occurred!
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
An Execution and its Lattice
Reachability
• kl is reachable from ij if there is a path from kl to ij in the lattice
Reachability
• kl is reachable from ij if there is a path from kl to ij in the lattice
Reachability
• kl is reachable from ij if there is a path from kl to ij in the lattice
Reachability
• kl is reachable from ij if there is a path from kl to ij in the lattice
So, why do we care about s again?
Deadlock is a stable property
Deadlock Deadlock
If a run R of the snapshot protocol starts in i and terminates in f, then
So, why do we care about s again?
Deadlock is a stable property
Deadlock Deadlock
If a run R of the snapshot protocol starts in i and terminates in f, then
Deadlock in s implies deadlock in f
No deadlock in s implies no deadlock in i
Same problem, different approach
Monitor process does not query explicitly
Instead, it passively collects information and uses it to build an observation.
(reactive architectures, Harel and Pnueli [1985])
An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.
Observations: a few observationsAn observation puts no constraint on the order in which the monitor receives notifications
Observations: a few observationsAn observation puts no constraint on the order in which the monitor receives notifications
Observations: a few observationsAn observation puts no constraint on the order in which the monitor receives notifications
Causal delivery
•FIFO delivery guarantees:
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
send event
receive event
deliver event
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
• send event
receive event
deliver event
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
send event
receive event
deliver event
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
send event
receive event
deliver event
1
Causal delivery
•FIFO delivery guarantees:
•Causal delivery generalizes FIFO:
send event
receive event
deliver event
1 2
Causal Deliveryin Synchronous Systems
•We use the upper bound on message delivery time
Causal Deliveryin Synchronous Systems
•We use the upper bound on message delivery time
•DR1: At time t , p0 delivers all messages it received with timestamp up to t- in increasing timestamp order
Causal Deliverywith Lamport Clocks
• DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.
Causal Deliverywith Lamport Clocks
• DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.
1
Causal Deliverywith Lamport Clocks
• DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.
1 4
Should p0 deliver?
Causal Deliverywith Lamport Clocks
• DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.
• Problem: Lamport Clocks don’t provide gap detection
1 4
Should p0 deliver?
Given two events e and e’ and their clock values LC(e) and LC(e’) — where LC(e) < LC(e’), determine whether some e’’ event
exists s.t. LC(e) < LC(e’’) < LC(e’)
Stability
•DR2: Deliver all received stable messages in increasing (logical clock) timestamp order.
•A message m received by p is stable at p if p will never receive a future message m s.t. TS(m’) < TS(m)
Implementing Stability
Real-time clocks
wait for time units
Implementing Stability
Real-time clocks
wait for time units
Lamport clocks
wait on each channel for m s.t. TS(m) > LC(e)
Design better clocks!
Clocks and STRONG Clocks
Lamport clocks implement the clock condition:
We want new clocks that implement the strong clock condition:
Causal Histories
The causal history of an event e in (H,!) is the set
Causal Histories
The causal history of an event e in (H,!) is the set
Causal Histories
The causal history of an event e in (H,!) is the set
How to build
•Each process : pi
initializes = 0
if eik is an internal or send event, then
• if eik is a receive event for message m,
then
Pruning causal histories
Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting)
Use a more clever way to encode (e)
Vector Clocks
Consider i(e), the projection of (e) on pi
i(e) is a prefix of hi : i(e) = hiki – it can be
encoded using ki
(e) = 1(e) [ 2(e) [ . . . [ n(e) can be encoded using
Represent using an n-vector VC such that
Update rules
Message m is timestamped with
Example
[1,0,0]
[0,1,0]
[2,1,0]
[1,0,1] [1,0,2] [1,0,3]
[3,1,2]
[1,2,3]
[4,1,2] [5,1,2]
[4,3,3]
[5,1,4]
Operational interpretation
•=
•=
[1,0,0]
[0,1,0]
[2,1,0]
[1,0,1] [1,0,2] [1,0,3]
[3,1,2]
[1,2,3]
[4,1,2] [5,1,2]
[4,3,3]
[5,1,4]
Operational interpretation
´ no. of events executed pi by up to and including ei
´
[1,0,0]
[0,1,0]
[2,1,0]
[1,0,1] [1,0,2] [1,0,3]
[3,1,2]
[1,2,3]
[4,1,2] [5,1,2]
[4,3,3]
[5,1,4]
Operational interpretation
´ no. of events executed pi by up to and including ei
´ no. of events executed by pj that happen before ei of pi
[1,0,0]
[0,1,0]
[2,1,0]
[1,0,1] [1,0,2] [1,0,3]
[3,1,2]
[1,2,3]
[4,1,2] [5,1,2]
[4,3,3]
[5,1,4]
VC properties:event ordering
1. Given two vectors V and V+, less than is defined as: V < V+ ´ (V V+) Æ (8 k : 1· k· n : V[k] · V+[k])
Strong Clock Condition:
Simple Strong Clock Condition: Given ei of pi and ej of pj, where i j
1. Concurrency: Given ei of pi and ej of pj, where i j
VC properties: consistency
Pairwise inconsistency
Events ei of pi and ej of pj (i j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if
Consistent Cut
A cut defined by (c1, . . ., cn) is consistent if and only if
VC properties:weak gap detection
Weak gap detection
Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t
[2,2,2]
[2,0,1]
[0,0,2]
VC properties:weak gap detection
Weak gap detection
Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t
[2,2,2]
[2,0,1]
[0,0,2]
[2,1,1]
[0,0,1]
[1,0,1]
VC properties:strong gap detection
Weak gap detection
Given ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k j, then there exists ek s.t
Strong gap detection
Given ei of pi and ej of pj, if VC(ei)[i] < VC(ej)[i] for some k j, then there exists ei
’ s.t
VCs for Causal Delivery
Each process increments the local component of its VC only for events that are notified to the monitor
Each message notifying event e is timestamped with VC(e)
The monitor keeps all notification messages in a set M
Stability
•Suppose p0 has received mj from pj.
•When is it safe for p0 to deliver mj ?
Stability
•Suppose p0 has received mj from pj
•When is it safe for p0 to deliver mj ?
There is no earlier message in M
Stability
•Suppose p0 has received mj from pj
•When is it safe for p0 to deliver mj ?
There is no earlier message in M
There is no earlier message from pj
no. of pj messages delivered by p0
Stability
•Suppose p0 has received mj from pj
•When is it safe for p0 to deliver mj ?
There is no earlier message in M
There is no earlier message from pj
There is no earlier message mk’’ from pk (k j) …
?
no. of pj messages delivered by p0
Checking for .
Let mk’ be the last message p0 delivered
from pk
By strong gap detection, mk’’ exists only if
Hence, deliver mj as soon as
The protocol
p0 maintains an array D[1, . . ., n] of counters
D[i] = TS(mi)[i] where mi is the last message delivered from pi
• DR3: Deliver m from pj as soon as both of the following conditions are satisfied:
1.
2.