chapter 4 causality, time, and global states
TRANSCRIPT
Chapter 4
Causality, Time,and Global States
Distributed Systems
SS 2019
Fabian Kuhn
Distributed Systems, SS 2019 Fabian Kuhn 2
Time in Distributed Systems
Goal: Establish a notion of time in (partially) asynchronous systems
Physical time:
β’ Establish an approximation of real time in a network
β’ Synchronize local clocks in a network
β’ Timestamp events (email, sensor data, file access times etc.)
β’ Synchronize audio and video streams
β’ Measure signal propagation delays (Localization)
β’ Wireless (TDMA, duty cycling)
β’ Digital control systems (ESP, airplane autopilot etc.)
Logical time:
β’ Determine an order on the events in a distributed system
β’ Establish a global view on the system
Distributed Systems, SS 2019 Fabian Kuhn 3
Logical Clocks
Goal: Assign a timestamp to all events in an asynchronous message-passing system
β’ Allows to give the nodes some notion of timeβ which can be used by algorithms
β’ Logical clock values: numerical values that increase over time and which are consistent with the observable behavior of the system
β’ The objective here is not to do clock synchronization:
Clock Synchronization: compute logical clocks at all nodes which simulate real time and which are tightly synchronized.β We will briefly talk about clock synchronization later...
Distributed Systems, SS 2019 Fabian Kuhn 4
Observable Behavior
Recall Executions / Schedules
β’ An exec. is an alternating sequence of configurations and events
β’ A schedule π is the sequence of events of an executionβ Possibly including node inputs
β’ Schedule restriction for node π£:π|π£ β "sequence of events seen by π£"
Causal Shuffles
We say that a schedule πΊβ² is a causal shuffle of schedule πΊ iff
βπ β π½: πΊ π = πΊβ² π.
Observation: If πβ² is a causal shuffle of π, no node/process can distinguish between π and πβ².
Distributed Systems, SS 2019 Fabian Kuhn 5
Causal Order
Logical clocks are based on a causal order of the events
β’ In the order, event π should occur before event πβ² if event π provablyoccurs before event πβ²β In that case, the clock value of π should be smaller than the one of πβ²
For a given schedule πΊ:
β’ The distributed system cannot distinguish π from another schedule πβ² ifand only if πβ² is a causal shuffle of π.β causal shuffle βΉ no node can distinguish
β no causal shuffle βΉ some node can distinguish
Event π provably occurs before πβ² if and only ifπ appears before πβ² in all causal shuffles of πΊ
Distributed Systems, SS 2019 Fabian Kuhn 6
Causal Shuffles / Causal Order Example
Schedule πΊ
π£1
π£2
π£3
π 1
π 2
π 3
π 4
π 5
π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 7
Causal Shuffles / Causal Order Example
Schedule πΊ
Some Causal Shuffle πΊβ²
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 8
Lamportβs Happens-Before Relation
Assumption: message passing system, only send and receive events
Consider two events π and πβ² occurring at nodes π and πβ²
β send event occurs at sending node, recv. event at receiving node
β letβs define π‘ and π‘β² be the (real) times when π and πβ² occur
We know that π provably occurs before πβ² if
1. The events occur at the same node and π occurs before πβ²
2. Event π is a send event, πβ² the recv. event of the same message
3. There is an event πβ²β² for which we know that provably,π occurs before πβ²β² and πβ²β² occurs before πβ²
Distributed Systems, SS 2019 Fabian Kuhn 9
Lamportβs Happens-Before Relation
Definition: The happens-before relation βπΊ on a schedule π is a pairwise relation on the send/receive events of π and it contains
1. All pairs π, πβ² where π precedes πβ² in π and π and πβ² are events of the same node/process.
2. All pairs (π, πβ²) where π is a send event and πβ² the receive event for the same message.
3. All pairs π, πβ² where there is a third event πβ²β² such thatπ βπ π
β²β² β§ πβ²β² βπ πβ²β Hence, we take the transitive closure of the relation defined by 1. and 2.
Distributed Systems, SS 2019 Fabian Kuhn 10
Happens-Before Relation: Example
Schedule πΊ
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 11
Happens-Before and Causal Shuffles
Theorem: For a schedule π and two (send and/or receive) events π and πβ², the following two statements are equivalent:
a) Event π happens-before πβ², i.e., π βπΊ πβ².
b) Event π precedes πβ² in all causal shuffles πβ² of π.
Some remarks before proving the theoremβ¦
β’ Shows that the happens-before relation is exactly capturing what we need about the causality between eventsβ It captures exactly what is observable about the order of events
β’ To prove the theorem, we show that1. a) βΆ b)
2. b) βΆ a)
Distributed Systems, SS 2019 Fabian Kuhn 12
Happens-Before and Causal Shuffles
If π βπΊ πβ², then π precedes πβ² in all causal shuffles πΊβ² of πΊ.
Distributed Systems, SS 2019 Fabian Kuhn 13
If π precedes πβ² in all causal shuffles πΊβ² of πΊ, then π βπΊ πβ².
Proof:
β’ Show: π βπ πβ², there is a shuffle πβ² such that πβ² precedes π in π
β’ W.l.o.g., assume that π precedes πβ² in πβ Consequently, π and πβ² happen at different nodes
(otherwise, the order remains the same in all causal shuffles)
β’ Events in red part can be shifted by fixed amount π«
Happens-Before and Causal Shuffles
π
πβ²
Distributed Systems, SS 2019 Fabian Kuhn 14
If π precedes πβ² in all causal shuffles πΊβ² of πΊ, then π βπΊ πβ².
Proof:
β’ Show: π βπ πβ², there is a shuffle πβ² such that πβ² precedes π in π
β’ Events in red part can be shifted by fixed amount π«β Consider some message π with send/receive events π π, ππβ If π π and ππ or only ππ are shifted, message delay gets larger OK
β It is not possible to only shift π πβ Choose Ξ large enough to move π past πβ²
Happens-Before and Causal Shuffles
π
πβ²
Distributed Systems, SS 2019 Fabian Kuhn 15
Lamport Clocks
Basic Idea:
1. Each event π gets a clock value π π β β
2. If π and πβ² are events at the same node and π precedes πβ², thenπ π < π πβ²
3. If π π and ππ are the send and receive events of some msg. π,π π π < π ππ
Observation:
β’ For clock values π π of events π satisfying 1., 2., and 3., we have
π βπΊ πβ² βΆ π π < π πβ²
β because < relation (on β) is transitive
β’ Hence, the partial order defined by π(π) is a superset of βπ
Distributed Systems, SS 2019 Fabian Kuhn 16
Lamport Clocks
Algorithm:
β’ Each node π’ keeps a counter ππ’ which is initialized to 0
β’ For any non-receive event π at node π’, node π’ computes
ππ’ β ππ’ + 1; π π β ππ’
β’ For any send event π at node π’, node π’ attaches the value of π π to the message
β’ For any receive event π at node π’ (with corresponding send event π ), node π’ computes
ππ’ β max{ ππ’, π(π )} + 1; π(π) β ππ’
Distributed Systems, SS 2019 Fabian Kuhn 17
Lamport Clocks: Example
Schedule πΊ
π£1
π£2
π£3
π 1
π 2
π 3
π 4
π 5
π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 18
Neiger-Toueg-Welch Clocks
Discussion Lamport Clocks:
β’ Advantage: no changes in the behavior of the underlying protocol
β’ Disadvantage: clocks might make huge jumps (when recv. a msg.)
Idea by Neiger, Toueg, and Welch:
β’ Assume nodes have some approximate knowledge of real timeβ e.g., by using a clock synchronization algorithm
β’ Nodes increase their clock value periodically
β’ Combine with Lamport clock ideas to ensure safety
β’ When receiving a message with a time stamp which is larger than the current local clock value, wait with processing the message.
Distributed Systems, SS 2019 Fabian Kuhn 19
Fidge-Mattern Vector Clocks
β’ Lamport clocks give a superset of the happens-before relation
β’ Can we compute logical clocks to get βπ exactly?
Vector Clocks:
β’ Each node π’ maintains an vector VC π’ of clock valuesβ one entry VCπ£(π’) for each node π£ β π
β’ In the vector VC π assigned (by π’) to some event π happening at node π’, the component π₯π£ corresponding to π£ β π refers to the
number of events at node π, π knows about when π occurs
Distributed Systems, SS 2019 Fabian Kuhn 20
Vector Clocks Algorithm
β’ All Nodes π’ keep a vector VC(π’) with an entry for all nodes in πβ all components are initialized to 0
β component corresponding to node π£: VCπ£(π’)
β’ For any non-receive event π at node π’, node π’ computes
VCπ’ π’ β VCπ’ π’ + 1; VC π β VC(π’)
β’ For any send event π at node π’, node π’ attaches the value of VC π to the message
β’ For any receive event π at node π’ (with corresponding send event π ), node π’ computes
βπ£ β π’: VCπ£ π’ β max VCπ£ π , VCπ£ π’ ;VCπ’ π’ β VCπ’ π’ + 1;VC π β VC(π’)
Distributed Systems, SS 2019 Fabian Kuhn 21
Vector Clocks Example
Schedule πΊ
π£1
π£2
π£3
π 1
π 2
π 3
π 4 π 5
π 6
π 7
π1
π2
π3
π5
π6
π7
π4
Distributed Systems, SS 2019 Fabian Kuhn 22
Vector Clocks and Happens-Before
Definition: ππ π < ππ πβ² β
βπ β π½:πππ π β€ πππ πβ² β§ ππ π β π½πͺ(πβ²)
Theorem: Given a schedule π, for any two events π and πβ²,
ππ π < ππ πβ² β· π βπ πβ²
Distributed Systems, SS 2019 Fabian Kuhn 23
Vector Clocks and Happens-Before
Definition: ππ π < ππ πβ² β
βπ β π½:πππ π β€ πππ πβ² β§ ππ π β π½πͺ(πβ²)
Theorem: Given a schedule π, for any two events π and πβ²,
ππ π < ππ πβ² β· π βπ πβ²
Distributed Systems, SS 2019 Fabian Kuhn 24
Logical Clocks vs. Synchronizers
Synchronizer:
β’ Algorithm that generates clock pulses that allow to run an synchronous algorithm in an asynchronous networkβ We will discuss synchronizers later
The clock pulses (local round numbers) generated by a synchronizer can also used as logical clocks
β’ Send events of round π get clock value 2π β 1
β’ Receive events of round π get clock value 2π
β’ superset of the happens-before relation
β’ requires to drastically change the protocol and its behaviorβ synchronizer determines when messages can be sent
β’ a very heavy-weight method to get logical clock valuesβ requires a lot of messages
Distributed Systems, SS 2019 Fabian Kuhn 25
Application of Logical Times
Replicated State Machine
β’ main application suggested by Lamport in his original paper
β’ a shared state machine where every node can issue operations
β’ state machine is simulated by several replicas
Solution:
β’ add current clock value (and issuer node ID) to every operation
β’ operations have to be carried out in order of clock values / IDs
β’ Safety:β all replicas use same order of operations
β order of operations is a possible actual order (consistent with local views)
β’ Liveness:β progress is guaranteed if nodes regularly send messages to each other
Distributed Systems, SS 2019 Fabian Kuhn 26
Global States
β’ Sometimes the nodes of a distributed system need to figure out the global state of the systemβ e.g., to find out if some property about the system state is true
β’ Executions/schedules which lead to the same happens-before relation (i.e., causal shifts) cannot be distinguished by the system.
β’ Generally not possible to record the global state at any given time of the execution
β’ Best solution: A global state which is consistent with all local viewsβ i.e., a state which could have been true at some time
β’ Called a consistent or global snapshot of the system and based on consistent cuts of the schedule
Distributed Systems, SS 2019 Fabian Kuhn 27
Consistent Cut
Cut
Given a schedule π, a cut is a subset πΆ of the events of π such that for all nodes π£ β π, the events in πΆ happening at π£ form a prefix of thesequence of events in π|π£.
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 28
Consistent Cut
Consistent Cut
Given a schedule π, a consistent cut πΆ is cut such that for all events π β πΆand all events π in π, it holds that
π βπΊ π βΆ π β πͺ
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 29
Consistent Cut
Schedule πΊ
Some Causal Shuffle πΊβ²
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
π£1
π£2
π£3
π 1
π 2π 3
π 4
π 5π 6
π 7
π 8
π 9
π 10
π1
π2
π3 π4
π6
π7
π8
π5
π10
π9
Distributed Systems, SS 2019 Fabian Kuhn 30
Consistent Cuts
Claim: Given a schedule π, a cut πΆ is a consistent cut if and only if for eachmessage π with send event π π and receive event ππ, if ππ β πΆ, then italso holds that π π β πΆ.
Distributed Systems, SS 2019 Fabian Kuhn 31
Consistent Snapshot
Consistent Snapshot = Global Snapshot = Consistent Global State
β’ A consistent snapshot is a global system state which is consistent with all local views.
Global System State (for schedule πΊ)
β’ A vector of intermediate states (in π) of all nodes and a description of the messages currently in transitβ Remark: If nodes keep logs of messages sent and received, the local states
contain the information about messages in transit.
Consistent Snapshot
β’ A global system state which is an intermediate global state for some causal shuffle of π (consistent with all local views)