distributed computing 5. snapshot shmuel zaks [email protected] ©
TRANSCRIPT
![Page 2: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/2.jpg)
2
The snapshot algorithm (Candy and Lamport)
![Page 3: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/3.jpg)
3
![Page 4: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/4.jpg)
4
![Page 5: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/5.jpg)
5
Goal: design a snapshot (=global-state-detection) algorithm that:
will record a collection of states of all system components (which forms a global system state),
will not change the underlying computation,
will not freeze the underlying computation
![Page 6: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/6.jpg)
6
A Process Can… record its own state, send and receive messages, record messages it sends and receives,
cooperate with other processes
Processes do not share clocks or memory
Processes cannot record their state
precisely at the same instant
![Page 7: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/7.jpg)
7
Motivation
Many problems in distributed systems can be stated in terms of the problem of detecting global states:
Stable property detection problems : termination detection, deadlock detection etc.
Checkpointing
![Page 8: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/8.jpg)
8
Stable Property Detection Problem
D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D
y is stable if y(S) implies y(S’) for all S’ reachable from S
![Page 9: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/9.jpg)
many distributed algorithms are structured as a sequence of phases
A phase: transient part, then a stable part
phase termination vs. computation termination
our view on the problem:i. detect the termination of a phaseii. initiate a new phase
Notice that “the kth phase has terminated” is a stable property
9
![Page 10: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/10.jpg)
10
Model
Distributed system D is a finite, labeled, directed graph.
p q
C2
C1
Channels have infinite buffers, are error-free and preserve FIFO
Message delay is bounded, but unknown
![Page 11: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/11.jpg)
11
State of a Channel
1p q
C1
23 1
[1, 2, 3] – sequence X of messages that were sent
[1] – sequence Y of received messages ( prefix of X )
[2, 3] – state of C1: X \ Y
p q
C2
C1
![Page 12: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/12.jpg)
12
Example: System
Distributed system: pC2
C1
Initial global state: B A Ø
Ø
State transitions
(same for p and q):
A Bsend
receive
q
![Page 13: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/13.jpg)
13
A A
Ø
A A
Ø
A B Ø
Ø
B A
Ø
Ø
A computation corresponds to a path in the diagram
p q qp
p sends
q receives
q sends
p receives q sends
C1
pC2
q
deterministic
A B
send
receive
Global state transition diagram
![Page 14: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/14.jpg)
14
Distributed system:
State transition: p :
q : C Dsend
receive
A Bsend
receive
p
C2
C1
q
Example: System
![Page 15: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/15.jpg)
15
qp
C2
C1
A D Ø
B C Ø
B D
A C Ø
Ø
p q qp
p sends
q sends
p receives
Global state transition diagram
q re
ceiv
es
non-deterministic
q sends
A Bsend
receiveC D
send
receive
q receives
![Page 16: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/16.jpg)
16
qp
C2
C1
A D Ø
B C Ø
B D
A C Ø
Ø
p q qp
p sends
q sends
p receives
We look at the following sequence of events:
A Bsend
receiveC D
send
receive
![Page 17: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/17.jpg)
17
Each process records its own statep and q cooperate to record the state of
C.
pC
q
in the snapshot algorithm:
![Page 18: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/18.jpg)
18
B A Ø
p q
Example: System
A A
A A
Recorded state:
pC
q
Ø
No token
C1
pC2
qA B
send
receive
Record C
Record qRecord p
![Page 19: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/19.jpg)
19
B A
Ø
Ø
p q
Example: System
B A
A A
Ø
Recorded state:
pC1
q
Two tokens
Record p
Record CRecord q
C1
pC2
qA B
send
receive
![Page 20: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/20.jpg)
C’s state recorded
time
P sends a message on C
P’s state recorded
C’s state recorded
P sends a message on C
P’s state recorded
20
Record p
Record CRecord q
Record C
Record qRecord p
![Page 21: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/21.jpg)
21
q will record the state of C
q starts recording C after it records its state
pC
q
p and q have to coordinate ; using a special
marker
q stops when receiving from p
But: how does q know when to record its state?
![Page 22: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/22.jpg)
22
Who starts?
We assume one process.
The snapshot algorithm
Hw: extend discussion + proof to any number of startes.
![Page 23: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/23.jpg)
Who will record the state of channel C? q
How q knows when to stop recording?
p sends right after it records its state, and before sending any other message
q starts recording after it records its state
(Intuition for the Algorithm)
pC
q
23
![Page 24: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/24.jpg)
24
The snapshot algorithm
Ends when q receives along C
Starts when q records itself
channel recordingp
Cq
Note : for any q p0, the channel along which arrived first is recorded as
![Page 25: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/25.jpg)
25
p0 starts.
The snapshot algorithm
p0 recoreds its state, and then broadcasts .
Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it
records its own state
State recording
![Page 26: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/26.jpg)
26
1. record the state of p2. send along c before sending any other messageMarker-Receiving Rule for a process q
if q’s state is not recorded: 1. record state; 2. record c’s state = ;else: c’s state is the sequence of messages received since q recorded its state
The snapshot algorithm
on receiving along channel c:
Marker-Sending Rule for a process q
![Page 27: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/27.jpg)
Termination
Assumption No marker remains forever in an input channel
Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time
Proof: by induction
27
![Page 28: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/28.jpg)
28
The Recorded Global State
State transition: p :
q : C Dsend
receive
A Bsend
receive
p
C2
C1
q
Ex: System
![Page 29: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/29.jpg)
29
A D
B C
B D
A C
p q qp
p sends
q sends
p receives
A D
qp
C2
C1A Bsend
receiveC D
send
receive
A
![Page 30: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/30.jpg)
30
What did we get?
![Page 31: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/31.jpg)
31
Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )
e is defined by < p, s, s’, M, c > e =<p, s, s’, M, c> may occur in global state S
if 1. the state of p in S is s. 2 a. if c is directed towards p: c’s state has M in its head, and is deleted after applying e . b. if c is directed from p: c’s state has M in its tail after applying e . 3. the state of p after applying e is s’.
![Page 32: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/32.jpg)
32
Process State and Global State A process: set of states, an initial state set of events A global state S: collection of process
states and channel states initially, each process is in its initial state and
all channels are empty
next(S, e) is the global state after event e in applied to global state S
![Page 33: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/33.jpg)
33
Process State and Global State
seq = (ei : i = 0…n) is a computation of the system iff
ei may occur in Si , 0 i n
Si+1 = next(Si, ei)
(S0 is the initial global
state)
![Page 34: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/34.jpg)
34
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
The Recorded Global State
![Page 35: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/35.jpg)
35
Definition Event ej is called pre-recording if ej is in a process p and p records its state after ej in seq .Event ej is called post-recording if ej is in a process p and p records its state before ej in seq .Assume that ej-1 is a post-recording event before Pre-recording event ej in seq.
pre-recording
post-recording
![Page 36: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/36.jpg)
36
Lemma:
Proof: ej-1 occurs in p and ej in q , and q ≠p(since ej-1 is and ej is .)
1
1
1 2
3
1 3 3 4
24
I f , then
. canbe applied in ,say ,
. canbe applied in ,say , and
c. S =S .
j j
j
j
e e
je
je
S S S
a e S S S
b e S S S
![Page 37: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/37.jpg)
37
The only scenario that might prevent interchanging the two events is that a message M is sent at ej-1 and received at ej .
but this cannot be possible: if M is sent at ej-1 , then M is , so a marker was sent to q before M, so when it is received in ej q already recorded its state, so ej Is ,a , a contradiction.
![Page 38: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/38.jpg)
38
Hence, event ej can occur in global state Sj-1. The state of process p is not altered by ej, hence ej-1 can occur after ej.
![Page 39: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/39.jpg)
39
We have to show that the states of all Processes and channels are the same in S2 and S4 .This clearly holds for proceses and channels That do not take part in ej-1 and ej .
![Page 40: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/40.jpg)
40
states: the states of p and q in S2 and in S4 are the same.
channels: whether ej-1/ej send/receive(/neither) a message along a channel, the same is done in both scenarios, So the states of the channels in S2 and S4 are the same.(End of proof. )
![Page 41: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/41.jpg)
(The Recorded Global State)
j '
jj '
j '
, where
1.
seq' = (e : j 0)
j < i j t : e = e
(e | i2. j <t)
: Given an execution seq, and an
output of the snapshot algorithm S*, there
exists a computation
For all or
The subseq
Theorem
uence
j
jj '
k
(e | i j <t)
j < i j t : S = 3
4. , such that
S
k, i k t
S * =
.
S
is a
permutation of the subsequence
For all or
There exists
![Page 42: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/42.jpg)
42
Proof Using the lemma, swap the events till all events appear after all events. The acquired computation is seq’. All that is left to show: S* is a global state after all events and before all events.1. Process states2. Channel states
![Page 43: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/43.jpg)
43
Claim: The state of a channel in S* is(sequence of messages corresp. to pre-recorded receives)-(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.
![Page 44: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/44.jpg)
44
A D
B C
D
A C
p q qp
p sends
q sends
p receives
A D
B
post
pre
post
qp
C2
C1A Bsend
receiveC D
send
receive
![Page 45: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/45.jpg)
45
A D
A D
D
A C
p q qp
q sends
p sends
p receives
A D
A
(Another execution)
pre
post
post
B
qp
C2
C1A Bsend
receiveC D
send
receive
![Page 46: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/46.jpg)
What did we get?
A configuration that could have happened
46
![Page 47: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/47.jpg)
seq = (ei: i ≥ 0) a distributed computation
Si – the state of the system right before ei occurs
S0 – the initial state of the system
St – the state of the system at the termination of
the algorithm
S* - the recorded global state
47
![Page 48: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/48.jpg)
Stable Detection
D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D
y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S
48
![Page 49: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/49.jpg)
49
Input: A stable property yOutput: a boolean value b with the property: y(S0) b and b y(St)
Algorithm
Algorithm: begin
record a global state S* b := y(S*) end
![Page 50: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/50.jpg)
50
Correctness 1. S* is reachable from S0
2. St is reachable from S*3. y(S) y(S’) for all S’ reachable from S
S0 S* St
y(S*)=true y(St)=true
y(S*)=false
y(S0)=false
![Page 51: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©](https://reader038.vdocument.in/reader038/viewer/2022110319/56649c735503460f94925ccf/html5/thumbnails/51.jpg)
References
K. M. Chandy and L. Lamport,Distributed Snapshots:Determining Global States of Distributed Systems
51