synchronization chapter 5. - 1 - contents qclock synchronization qlogical clocks qglobal state...

Synchronization

Chapter 5

- 2 -

Contents

Clock SynchronizationLogical ClocksGlobal StateElection AlgorithmsMutual ExclusionDistributed TransactionsConclusionCritical Idea

- 3 -

Clock synchronization

A simple question: Is it possible to synchronize all the clocks in a distributed system ?

- 4 -

Physical Clocks (1)

Some concept: Timer; counter, holding register; clock tick; clock skew.

Prob: How do we synchronize them with real-world clocksHow do we synchronize the clocks with each other

Mean solar second: measuring a large numbers of day taking average dividing by 86400

TAI: the mean number of sticks of the cesium 133 clocks (since 1/1/1958) divided by 9,192,631,770

- 5 -

Physical Clocks (2)

TAI: highly stable but late leap secondBy the way: raising their frequency from 60Hz or

50Hz 61Hz or 51Hz

- 6 -

Cristian’s Algorithm No more than δ/2ρ each machine sends a message to the

time server (which has a WWV receiver) asking for the current time

Probs: time must never run backward and it’s take a nonzero amount of time for the time server’s reply to get back to the sender

- 7 -

The Berkeley Algorithm

a)The time daemon asks all the other machines for their clock values

b)The machines answer

c)The time daemon tells everyone how to adjust their clock

- 8 -

Averaging Algorithms

At the beginning of each interval, every machine broadcasts the current time according to its clock.

Then it starts a local timer to collect all other broadcasts that arrive during some interval S.

The simplest algorithm is just to average the values from all other machines.

One of the most widely used algorithms in the Internet is the Network Time Protocol (NTP).

- 9 -

Assign time to dist. Sys.

If a happens before b in the same process, C(a) < C(b).

If a and b represent the sending and receiving of a message, respectively, C(a) < C(b).

For all distinctive events a and b, C(a) ≠ C(b).

- 10 -

Totally order Multicasting

Timestamp can be used to implement totally ordered multicast

- 11 -

Vector timestamps

VT(a) < VT(b) a: causally precede event bProperties of vector timestamps

Vi[i] is the number of events that have occurred so far at P i

If Vi[j] = k then Pj knows that k events have occurred at P j

Message r (from PJ): reaction of message a (PI)

PK process message r if:vt(r)[j] = Vk[j] + 1

vt(r)[i] ≤ Vk[i] for all i≠j

- 12 -

Global State (1)

a) A consistent cutb) An inconsistent cut

- 13 -

Global State (2)

a) Organization of a process and channels for a distributed snapshot

- 14 -

Global State (3)

b) Process Q receives a marker for the first time and records its local state

c) Q records all incoming messaged) Q receives a marker for its incoming channel and finishes

recording the state of the incoming channel

- 15 -

Election AlgorithmsElection algorithms: algorithms for electing a

coordinator (using this as a generic name for the special process).

Election algorithms attempt to locate the process with the highest process number and designate it as coordinator.

Goal: to ensure that when an election starts, it concludes with all processes agreeing on who the new coordinator is to be.

- 16 -

The Bully Algorithm (1)

The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election

- 17 -

The Bully Algorithm (2)

d) Process 6 tells 5 to stope) Process 6 wins and tells everyone

Ring Algorithm

We assume that the processes are physically or logically

ordered, so that each process knows who is successor is. When any process notices that the coordinator is not functioning,

it builds ELECTION message containing its own process num

and sends message to its successor. If successor is down, the sender skips over the successor and

goes to the next number along the ring, or the one after that,

until a running process is located. At each step , the sender adds its own process num to the list in

the message effectively making itself a candidate to be elected

as a coordinator.

- 18 -

- 19 -

Ring Algorithm

- 20 -

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2

Mutual ExclusionCentralized Algorithm

Distributed Algorithm

When a process wants to enter a critical region, it builds a

message containing the name of the critical region it wants to

enter , its process number, and the current time. Sends the message to all other processes, conceptually including

itself. The sending of message is assumed to be reliable. When a process receives a request message from other process,

the action it takes depends on its state with respect to the

critical region named in the message. Three cases have to be distinguished.

- 21 -

Distributed Algorithm contd

1. If the receiver is not in the critical region and does not want

to enter it, it sends back an OK message to the sender. 2. If the receiver is already in the critical region, it does not

reply. Instead, it queues the request. 3. If the receiver wants to enter the critical region but has not

yet done so, it compares the timestamp in the incoming

message, the lowest one wins. If the incoming message is

lower, the receiver sends back an OK message. If its own

message has a lowest timestamp, the receiver queues the

incoming request and sends nothing.

- 22 -

- 23 -

Distributed Algorithm example

a) Two processes want to enter the same critical region at the same moment.b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the

critical region.

- 24 -

Token Ring Algorithm

a) An unordered group of processes on a network.

b) A logical ring constructed in software.

- 25 -

Comparison

A comparison of three mutual exclusion algorithms.

AlgorithmMessages per entry/exit

Delay before entry (in message times)

Problems

Centralized 3 2 Coordinator crash

Distributed 2 ( n – 1 ) 2 ( n – 1 )

Crash of any process

Group communication

Token ring 1 to 0 to n – 1Lost token, process crash

- 26 -

The Transaction Model (1)

Updating a master tape is fault tolerant.

- 27 -

The Transaction Model (2)

Examples of primitives for transactions.

Primitive Description

BEGIN_TRANSACTION Make the start of a transaction

END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old values

READ Read data from a file, a table, or otherwise

WRITE Write data to a file, a table, or otherwise

- 28 -

Four Characteristics

Atomic:to the outside world, the transaction happens indivisibly

Consistent: the transaction does not violate system invariants

Isolated: concurrent transactions do not interfere with each other

Durable: once a transaction commits, the changes are permanent

- 29 -

Limitations of Flat Transactions

Main limitation: do not allow partial results to be committed or aborted

In the case of updating all of the hyperlinks to a webpage W, which moved to a new location

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;END_TRANSACTION

(a)

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>ABORT_TRANSACTION (b)

- 30 -

Classification of Transactions

a) A nested transactionb) A distributed transaction

- 31 -

ImplementationPrivate Workspace

A file is only for read not modify there is no need for a private copy

a) The file index and disk blocks for a three-block fileb) The situation after a transaction has modified block 0 and appended block 3b) After committing

- 32 -

Writeahead Log

a) A transactionb) – d) The log before each statement is executed

x = 0;

y = 0;

BEGIN_TRANSACTION;

x = x + 1;

y = y + 2

x = y * y;

END_TRANSACTION;

(a)

Log

[x = 0 / 1]

(b)

Log

[x = 0 / 1]

[y = 0/2]

(c)

Log

[x = 0 / 1]

[y = 0/2]

[x = 1/4]

(d)

- 33 -

Concurrency Control (1)

General organization of managers for handling transactions

- 34 -

Concurrency Control (2) General organization of

managers for handling distributed transactions

- 35 -

Serializability

The whole idea behind concurrency control is to properly schedule conflicting operations (two read operations never conflict )

Synchronization can take place either through mutual exclusion mechanisms on shared data (i.e locking)

Or explicitly ordering operations using timestamps

- 36 -

Two-phase locking

A transaction T is granted a lock if there is no conflict The scheduler will never release a lock for data item x, until the data

manager acknowledges it has performed the operation for which the lock was set

Once the scheduler has released a lock on behalf of a transaction T, it will never grant another lock on behalf of T

- 37 -

Strict two-phase locking

In centralized 2PL: a single site is responsible for granting and releasing locks

In primary 2PL: each data item is assigned a primary copy

In distributed 2PL: the schedulers on each machine not only take care that locks are granted and released, but also that the operation is forwarded to the (local) data manager

- 38 -

Pessimistic Timestamp Ordering

Concurrency control using timestamps.

- 39 -

Conclusion

Lamport timestamps: if a happen before b C(a) < C(b).

Determining the global state can be done by synchronizing all processes so that each collects its own local state, along with the messages that are currently in transit.

Synchronization between processes choose a coordinator election algorithms

Mutual Exclusion algorithms: can be centralized or distributed

- 40 -

Conclusion & Critical Idea

A transaction consists of a series of operationsA transaction is durable, meaning that if it

completes, its effects are permanent

Two-phase locking can lead to dead lockAcquiring all locks in some canonical order to

prevent hold-and-wait cyclesUsing deadlock detection by maintaining an

explicit graph for cyclesInheritance Priority Protocol

- 41 -

42

Consistency and Replication

Chapter 6

43

Content1. Definition of Consistency and Replication

2. Understand Replication

Reason for Replication & Problem of Replication

3. The only solution for Replication Problem

4. Consistency Model

Data-centric models & Client-centric models

5. Distribution Protocols

Distributing updates to replicas

6. Consistency Protocols

Implementation of consistency models

44

Consistency and ReplicationIntroduction:

Replication:

Replication of data

Reason for Replication:

Enhance reliability or improve performance

Consistency:

Consistency of replicated data

Reason for Consistency:

Keep replicas to be the same

45

Understanding ReplicationIntroduction:

Object Replication

Purpose: managing data in distributed system

Consider objects instead of data alone

Benefit of encapsulating and operating data

Consistency Problem:

Whenever a replica is updated, that replica becomes different from the others. Synchronization Problem

Two Approaches focus on who will deal with it:

Object-specific Replication

Middleware Replication

46

Understanding Replication

a) A distributed system for replication-aware distributed objects.b) A distributed system responsible for replica management

Two Approaches

47

Only solution for Consistency ProblemIntroduction:

Synchronous replication consistency

Key idea: a single atomic operation, or transaction

Difficulties:

need to synchronize all replicas

a lot of communication time

expensive in terms of performance

Only solution:

Loosen the consistency constraints

need not to be executed as atomic operations

copies may not always same

48

Consistency ModelsIntroduction:

Consistency Model:

A contract between processes and the data store.

Two kind of Models:

Data-centric Consistency Models

Guarantee for a number of processes

Simultaneously update

Sequential consistency

Client-centric Consistency Models

Guarantee for a single processes

Lack simultaneous updates

49

Data-centric Consistency Models

The general organization of a logical data store, physically distributed and replicated across multiple processes.

50

Data-centric Consistency Models Data-centric Consistency Models (7 kinds)

Strict Consistency

Linearizability and Sequential Consistency

Causal Consistency

FIFO Consistency

Weak Consistency

Release Consistency

Entry Consistency

Loosen constraints

51

Data-centric Consistency ModelsConsistency Description

Strict Absolute time ordering of all shared accesses matters.

LinearizabilityAll processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp

SequentialAll processes see all shared accesses in the same order. Accesses are not ordered in time

Causal All processes see causally-related shared accesses in the same order.

FIFOAll processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order

(a)

Consistency Description

Weak Shared data can be counted on to be consistent only after a synchronization is done

Release Shared data are made consistent when a critical region is exited

Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.

(b)

a) Consistency models not using synchronization operations.b) Models with synchronization operations.

52

FIFO Consistency (1)Necessary Condition:

“Any read on a data item x returns a value corresponding to the result of the most recent write x”

Writes done by a single process are seen by all other processes in the order in which they were issued

but writes from different processes may be seen

in a different order by different processes.

53

FIFO Consistency (2)

A valid sequence of events of FIFO consistency

54


Statement execution as seen by the three processes from the previous slide. The statements in bold are the ones that generate the output shown.

x = 1;

print (y, z);

y = 1;

print(x, z);

z = 1;

print (x, y);

Prints: 00

(a)

x = 1;

y = 1;

print(x, z);

print ( y, z);

z = 1;

print (x, y);

Prints: 10

(b)

y = 1;

print (x, z);

z = 1;

print (x, y);

x = 1;

print (y, z);

Prints: 01

(c)

55


Two concurrent processes.

Process P1 Process P2

x = 1;

if (y == 0) kill (P2);

y = 1;

if (x == 0) kill (P1);

56

Weak Consistency (1)Properties: Accesses to synchronization variables

associated with a data store are sequentially consistent

No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere

No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.

57

Weak Consistency (2)

A program fragment in which some variables may be kept in registers.

int a, b, c, d, e, x, y; /* variables */int *p, *q; /* pointers */int f( int *p, int *q); /* function prototype */

a = x * x; /* a stored in register */b = y * y; /* b as well */c = a*a*a + b*b + a * b; /* used later */d = a * a * c; /* used later */p = &a; /* p gets address of a */q = &b /* q gets address of b */e = f(p, q) /* function call */

58

Weak Consistency (3)

a) A valid sequence of events for weak consistency.b) An invalid sequence for weak consistency.

59

Client-centric Consistency Models Different types of Client-centric Consistency Models:

Eventual Consistency

Monotonic-Read Consistency

Monotonic-Write Consistency

Read-Your-Writes Consistency

Writes-Follow-Reads Consistency

60

Eventual Consistency

The principle of a mobile user accessing different replicas of a distributed database.

61

Monotonic Reads

The read operations performed by a single process P at two different local copies of the same data store.

a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.

62

Monotonic Writes

The write operations performed by a single process P at two different local copies of the same data store

a) A monotonic-write consistent data store.b) A data store that does not provide monotonic-write consistency.

63

Read Your Writes

a) A data store that provides read-your-writes consistency.b) A data store that does not.

64

Writes Follow Reads

a) A writes-follow-reads consistent data storeb) A data store that does not provide writes-follow-reads consistency

65

Distribution Protocols Purpose: Solve the following problem

What is exactly propagated?

Where updates are propagated?

By whom propagation is initiated?

Three Distribution Protocols:

Replica Placement

Update Propagation

Epidemic Protocols

66

Replica Placement

The logical organization of different kinds of copies of a data store into three concentric rings.

67

Replica Placement Replica Placement

Permanent Replicas

The initial set of replicas

Constitute a distributed data store

Server-Initiated Replicas

Copies of a data store

Exist to enhance performance

Created at the initiative of the data store

Client-Initiated Replicas

Created at the initiative of clients

Commonly known as client caches

68

Update Propagation Introduction:

Update Propagation generally initiated at a client

Subsequently forwarded to one of the copies

Three design issues:

State Vs. Operations

What is actually to be propagated

Pull Vs. Push Protocols

Whether updates are pulled or pushed

Unicasting Vs. Multicasting

Whether unicasting or multicasting should be used

69

Epidemic ProtocolsIntroduction:

Update propagation in eventual-consistent data stores is often implemented by a class of algorithms known as Epidemic Protocols.

Don’t solve any update conflicts

Only concern is propagating updates to all replicas in as few messages as possible.

Update Propagation Models (Example: anti-entropy)

1. P only pushes its own updates to Q

2. P only pulls in new updates from Q

3. P and Q send updates to each other

70

Consistency ProtocolsIntroduction:

Consistency protocol

Describe implementations of specific consistency models

Models: sequential consistency, weak consistency with synchronization variables, and atomic consistency

Three Protocols:

Primary-Based Protocols

Remote-Write Protocols & Local-Write Protocols

Replicated-Write Protocols

Active Replication & Quorum-Based Protocols

Cache-coherence Protocols

71


Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.

Remote-Write Protocols (1)

72


The principle of primary-backup protocol.

Remote-Write Protocols (2)

73


Primary-based local-write protocol in which a single copy is migrated between processes.

Local-Write Protocols (1)

74


Primary-backup protocol in which the primary migrates to the process wanting to perform an update.

Local-Write Protocols (2)

75


The problem of replicated invocations.

Active Replication (1)

76


a) Forwarding an invocation request from a replicated object.b) Returning a reply to a replicated object.

Active Replication (2)

77

Replicated-Write ProtocolsQuorum-Based Protocols

Three examples of the voting algorithm:a) A correct choice of read and write setb) A choice that may lead to write-write conflictsc) A correct choice, known as ROWA (read one, write all)

78

End of Chapter 6Thank you!

synchronization chapter 5. - 1 - contents qclock synchronization qlogical clocks qglobal state...

Documents