distributed algorithms (cas 769) · pdf filedistributed algorithms (cas 769) week 1:...

46
Distributed Algorithms (CAS 769) Week 1: Introduction, Logical clocks, Snapshots Dr. Borzoo Bonakdarpour Department of Computing and Software McMaster University Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 1/44

Upload: phungtram

Post on 06-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Distributed Algorithms (CAS 769)Week 1: Introduction, Logical clocks, Snapshots

Dr. Borzoo Bonakdarpour

Department of Computing and SoftwareMcMaster University

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 1/44

Presentation outline

Introduction

Logical Clocks

Snapshots (Global States)

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 2/44

Acknowledgments

Most of the contents of these slides are obtained from thefollowing books:

I Distributed Algorithms: An Intuitive Approach - Wan FokkinkI Elements of Distributed Computing - Vijay K. Garg

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 3/44

Distributed Systems

Some DefinitionsThere is no universally accepted definition of a distributed system.

What makes a system distributed?

One man’s constant is another man’s variable.

- Alan Perlis

A distributed system is a system where I can’t get my work donebecause a computer has failed that Ive never even heard of.

A distributed system is one in which the failure of a computer youdidn’t even know existed can render your own computer unusable.

- Leslie Lamport

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 4/44

Distributed Systems

Some DefinitionsA distributed system is one that

I has multiple machinesI is connected by a networkI is cooperating on some task

Communication in Distributed SystemsI Message passingI Shared memory

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 5/44

Distributed Systems

We begin with message passing systems.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 6/44

Preliminaries

Message Passing FrameworkIn a message passing framework, a distributed system

I consists of a finite graph of N processes (a process is a running programand each process has its local state)

I Each process carries a unique IDI Processes communicate through FIFO channels

Characteristics of CommunicationI Communication is asynchronous; i.e., sending and receiving messages

are distinct events, respectivelyI Delay in channels is arbitrary but finiteI There are no garbled, duplicated or lost messages

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 7/44

Preliminaries

Other AssumptionsI Absence of a shared clockI Absence of shared memoryI Absence of accurate failure detection

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 8/44

Example

{x1=0}Process P1(){e1

0 : send(P2,m1);e1

1 : x1=5;e1

2 : x1=10;e1

3 : recv(m2);}

{x2=0}Process P2(){e2

0 : recv(m1);e2

1 : x2=15;e2

2 : x2=20;e2

3 : send(P1,m2);}

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 9/44

Preliminaries

Transition SystemsThe behavior of a distributed algorithm, which runs on a distributed system isoften captured by a transition system, which consists of:

I A set C of configurations (i.e., the composition of local states of itsprocesses plus the messages in transit)

I A binary transition relation→ on CI A set I ⊆ C of initial configurations

A configuration γ is terminal, if there does not exist γ′ ∈ C such that γ → γ′

An execution of the distributed system is a sequence γ̄ = γ0γ1γ2 · · · suchthat:

I γ0 ∈ II for all i ≥ 0, we have γi → γi+1

A configuration δ is reachable if there is a γ0 ∈ I and a finite executionγ0γ1 · · · γk , such that γk = δ.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 10/44

Example

For example, in the distributed algorithm on Slide 9:

I Configuration (x1 = 0, x2 = 0) is the only initial configuration.I Configuration (x1 = 10, x2 = 20) is the only terminal configuration.I (x1 = 0, x2 = 0)→ (x1 = 5, x2 = 0)→ (x1 = 10, x2 = 0)→ (x1 =

10, x2 = 15)→ (x1 = 10, x2 = 20) is a valid execution.I And so is (x1 = 0, x2 = 0)→ (x1 = 5, x2 = 0)→ (x1 = 5, x2 = 15)→

(x1 = 10, x2 = 15)→ (x1 = 10, x2 = 20).

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 11/44

Preliminaries

Question: Is configuration reachability decidable?

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 12/44

Preliminaries

A transition between two configurations is associate to an event.

A process can perform an internal (i.e., change of local state of a process),send, or receive event.

A process if called an initiator if its first event is either internal or send.

An assertion is a predicate on the configuration of an algorithms (e.g.,x ≥ y + 1). We use assertions to define safety properties.

An assertion P is an invariant if:

I P(γ) for all γ ∈ I, andI if γ → γ′ and P(γ), then P(γ′).

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 13/44

Example

For example, in the distributed algorithm on Slide 9:

I Instruction x1 = 5 is an internal event.I Process P1 is an initiator.I (x1 ≤ 100 ∧ x2 ≤ 50) is an invariant.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 14/44

Preliminaries

PropertiesA property is a set of executions.

Safety PropertiesA safety property typically expresses that something bad will never happen.For example:

I The temprature of a boiler never reaches 100 degress.I If an interrupt occurs, a message will be printed in one second.

Formally, a safety property is a set S of infinite executions where:

∀γ̄ 6∈ S : ∃α ≤ γ̄ : ∀γ̄′ : α ≤ γ̄′ ⇒ γ̄′ 6∈ S

where α ≤ γ denotes the fact that α is a prefix of γ.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 15/44

Preliminaries

Liveness PropertiesA liveness property typically expresses that something good will eventuallyhappen. Formally, if L is a liveness property, then the following holds:

∀α : ∃γ̄ : αγ̄ ∈ L

where α is a finite execution and γ̄ is an infinite execution.Examples of liveness properties:

I Non-starvation.I If an interrupt occurs, a message will be printed .

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 16/44

Presentation outline

Introduction

Logical Clocks

Snapshots (Global States)

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 17/44

Causal Order

In an asynchronous distributed system, in each configuration, different eventscan occur in different processes.

Such occurrence of events are independent.

The causal order ≺ is a binary relation on events in an execution, such thata ≺ b iff event a happened before event b. I.e., events in an executioncannot be reordered, so that a happens after b.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 18/44

Causal Order

Causal Order (Happened Before)Formally, the causal order (also called happened before) ≺ is the smallestbinary relation, where

I if a and b are events at the same process and a occurs before b, thena ≺ b,

I if a is a send event and b the corresponding receive event, then a ≺ b,and

I if a ≺ b and b ≺ c, then a ≺ c.

Notice that the happened before relation is a partial order.

We write a � b if either a ≺ b or a = b.

If a 6≺ b and b 6≺ a, then we say a and b are concurrent events.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 19/44

Computation

A permutation of concurrent events in an execution does not affect the resultof the execution.

P1P2

e11

e12

e13

e10

e22

e23

e21

e20

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 20/44

Computation

The set of all permutations form the computation lattice.

e11, e21 e10, e

22

e12 e11, e20 e10, e

21

e10

e10, e20

e12, e20

e12, e21 e11, e

22 e10, e

23

e11, e23e12, e

22

e11

e12, e23

e13, e23

{}

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 21/44

Happened before Vs. Physical Time

Question: If a safety property holds in the happened beforerelation, does it hold in physical time as well?

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 22/44

Logical Clocks

Since a physical shared clock does not exists in a distributed system, we uselogical clocks.

A logical clock C maps occurrences of events in a computation to a partiallyordered set such that

a ≺ b ⇒ C(a) < C(b)

Lamport’s clock LC assigns to each event a the length k of a longestcausality chain a1 ≺ · · · ≺ ak = a in the computation.

Obviously,a ≺ b ⇒ LC(a) < LC(b)

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 23/44

Logical Clocks

Algorithm for Handling Lamport’s clocksConsider an event a, and let k be the clock value of the previous event at thesame process (k = 0 if there is no such previous event).

I If a is an internal or send event, then

LC(a) = k + 1

I If a is a receive event and b the corresponding send event, then

LC(a) = max{k , LC(b)}+ 1

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 24/44

Vector Clocks

The vector clock VC has the property

a ≺ b ⇔ VC(a) < VC(b)

Let a distributed system consist of processes p0, . . . , pN−1. The vector clockassigns events a computation values in NN , whereby this set is provided witha partial order defined by:

(k0, . . . , kN−1) ≤ (l0, . . . , lN−1) ⇔ ki ≤ li , for all i ∈ {0, . . . ,N − 1}

The vector clock is defined as follows: VC(a) = (k0, . . . , kN−1), where ki isthe length of a longest causality chain ai

1 ≺ · · · ≺ aiki

of events at process pi

with aiki� a.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 25/44

Example

Demonstrate the evolution of the vector clock for this computation:

P1P2

e11

e12

e13

e10

e22

e23

e21

e20

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 26/44

Presentation outline

Introduction

Logical Clocks

Snapshots (Global States)

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 27/44

Sanpshot (Global State)

DefinitionsSnapshot cannot be defined based on physical time (e.g., the composition ofall local state at the same time instant).

We use the happened before relation to compute concurrent local states and,hence, snapshots.

A (global) snapshot of an execution of a distributed algorithm is aconfiguration of this execution, consisting of the local states of the processesand the messages in transit.

Intuitively, a snapshot is consistent if it represents a configuration of thecurrent execution or a configuration of an execution in the same computation.

Snapshots are useful to determine stable properties of a distributed system(i.e., properties that when become true, will remain true). E.g., deadlock,termination, loss of a token, etc.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 28/44

Sanpshot

The ChallengeWhy is it difficult to compute a snapshot of a distributed system at run time?

Taking a global snapshot is like taking the picture of the sky: the scene is sobig that it cannot be captured by a single photograph. The challenge is takingmultiple photographs at the same time is not quite possible.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 29/44

Sanpshot

TerminologySuppose we design an algorithm that takes a snapshot of another distributedalgorithm. We call the messages of the underlying algorithm basic messagesand messages of the snapshot algorithm control messages.

An event is called presnapshot if it occurs at a process before the localsnapshot at this process is taken.

Otherwise it’s called postsnapshot.

Consistent SnapshotA snapshot is consistent if

I for each presnapshot event a, all events that are causally before a arealso presnapshot,

I a basic message included in a channel state iff the correspondingsend event is presnapshot while the corresponding receive event ispostsnapshot.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 30/44

Example

m2

P1

P2

P3

m1

G2

m3

G1

G1 is not a consistent snapshot, but G2 is.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44

Example

m2

P1

P2

P3

m1

G2

m3

G1

G1 is not a consistent snapshot, but G2 is.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44

Chandy-Lamport Algorithm

AssumptionAll channels are FIFO.

ChallengesI All recorded local state are mutually concurrentI The state of all channels are captured correctly.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 32/44

Chandy-Lamport Algorithm

SolutionI We associate with each process a variable called color that is either red

or white. All processes are initially white.I Intuitively, the computed global snapshot corresponds to the state of the

system just before the processes turn red.I The algorithm relies on special control messages called markersI Once a process turns red, it send a marker along all its outgoing

channels before it sends out any message.I A process turns red on receiving a marker if it has not already done so.I No white process receives a marker from a red process. Why?I This guarantees that local states are mutually concurrent. Why?

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 33/44

Chandy-Lamport Algorithm

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 34/44

Chandy-Lamport Algorithm

Classification of Basic MessagesI (ww messages) These messages are sent by a white process to a white

process. These message correspond to the messages sent andreceived before the global snapshot.

I (rr messages) These message correspond to the messages sent andreceived after the global snapshot.

I (rw messages) These messages cross the global snapshot in thebackward directions. Such a message will make the snapshotinconsistent. It is not possible to have such messages, if a marker isused. Why?

I (wr messages) These messages cross the global snapshot in theforward directions and participate in the state of the channel in thesnapshot, because they are in transit when the snapshot is taken.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 35/44

Chandy-Lamport Algorithm

rrwr

rw

P2

P1

ww

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 36/44

Chandy-Lamport Algorithm (Example)

A B

C

A

m1, 〈

mkr〉

〈mkr〉

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 37/44

Chandy-Lamport Algorithm (Example)

A B

C

A

m1, 〈

mkr〉

〈mkr〉

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 37/44

Chandy-Lamport Algorithm (Example)

A B

C

A

m1, 〈

mkr〉

B〈mkr〉

m2

B computes the state of channel AB as {}.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 38/44

Chandy-Lamport Algorithm (Example)

A B

C

A

C

m1

B

〈mkr〉, m

2

C computes the state of channel AC as {}.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 39/44

Chandy-Lamport Algorithm (Example)

A B

C

A

C

m1

B

{m2}

B computes the state of channel CB as {m2}.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 40/44

Chandy-Lamport Algorithm (Example)

Question: Is the computed snapshot a configuration of theactual execution?

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 41/44

Lai-Yang Algorithm

AssumptionsThis algorithm does not assume FIFO channels.

But it assumes message piggybacking.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 42/44

Lai-Yang Algorithm

The AlgorithmI Any initiator can decide to take a local snapshot.I As long as a process hs not taken a local snapshot, it appends false to

its outgoing basic messages.I When a process has taken its local snapshot, it appends true to each

outgoing basic message.I When a process that hasn’t yet taken a snapshot receives a message

with true or a control message (see next slide) for the first time, it takes alocal snapshot of its state before reception of this message.

I A process q computes as channel state of pq the basic messageswithout the tag true that it receives via pq after its local snapshot.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 43/44

Lai-Yang Algorithm

The AlgorithmQuestion: How does q know when it can determine the channel state of pq?

p sends a control message to q, informing q how many basic messageswithout the tag true p sent into pq.

These control messages also ensure that all processes eventually take alocal snapshot.

Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 44/44