replication and consistency - uppsala...

ReplicationAndConsistency September 2002

© DoCS 2002 1

Distributed Systems

Replication and Consistency

September 2002

September 2002 © DoCS 2

Introduction to replication (1)Replication can provide the following

Performance enhancemento e.g. several web servers can have the same DNS name and the

servers are selected in turn. To share the load.o replication of read-only data is simple, but replication of

changing data has overheads

Fault-tolerant serviceo guarantees correct behaviour in spite of certain faults (can

include timeliness)o if f of f + 1 servers crash then 1 remains to supply the serviceo if f of 2f + 1 servers have byzantine faults then they can supply

a correct service


© DoCS 2002 2


Introduction to replication (2)

Availability is hindered byo server failures

replicate data at failure- independent servers and when one fails, client may use another. Note that caches do not help with availability (they are incomplete).

o network partitions and disconnected operationUsers of mobile computers deliberately disconnect, and then on re-connection, resolve conflicts


Requirements for replicated data

Replication transparencyo clients see logical objects (not several physical copies)

they access one logical item and receive a single resultConsistencyo specified to suit the application,

e.g. when a user of a diary disconnects, their local copy may beinconsistent with the others and will need to be reconciled when they connect again. But connected clients using different copies should get consistent results.

What is replication transparency?


© DoCS 2002 3


System modelEach logical object is implemented by a collection of physical copies called replicaso the replicas are not necessarily consistent all the time (some may

have received updates, not yet conveyed to the others)We assume an asynchronous system where processes fail only by crashing and generally assume no network partitionsReplica managers o an RM contains replicas on a computer and access them directlyo RMs apply operations to replicas recoverably

i.e. they do not leave inconsistent results if they crasho objects are copied at all RMs unless we state otherwiseo static systems are based on a fixed set of RMso in a dynamic system: RMs may join or leave (e.g. when they crash)o an RM can be a state machine.


ReplicationImprove availabilityImprove performanceo Scaleo Location

Cost is consistency“Informal replication” - caching“Formal replication” - managed


© DoCS 2002 4


Managing ConcurrencyObjects must manage concurrent invocationsReplicas states must be consistentConcurrent invocations on replicas should have same behavior as on single object.

client client

client client

object

replica replica

1 3

2


Object Replication

A remote object capable of handling concurrent invocations on its own.A remote object for which an object adapter is required to handle concurrent invocations

Object managed “System” managed


© DoCS 2002 5


Object Replication

a) A distributed system for replication-aware distributed objects.b) A distributed system responsible for replica management


Consistency Models

Data-Centric Consistency ModelsClient-Centric Consistency Models


© DoCS 2002 6


Consistency Models (Data Centric)Strict consistencySequential consistencyLinearizabilityCausal consistencyFIFO consistencyWeak consistency


Strict ConsistencyCondition: Any read of X returns the value of

the most recent write of X.Depends on a global clock.

strict consistency not strict consistency

Wi(x)a - process i writes value a to variable xRi(x)b - process i gets value b by reading xtsOP(x) - the OP (read, write) was performed on x at this time


© DoCS 2002 7


Sequential ConsistencyCondition: The result of any execution is the

same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the Operations of each individual process appear in this sequence in the order specified by its program.


Sequential Consistency

Sequentially Consistent Not Sequentially Consistent


© DoCS 2002 8


Sequential Consistency

z = 1;print (x, y);

y = 1;print (x, z);

x = 1;print ( y, z);

Process P3Process P2Process P1

Four valid executions


Linearizable ConsistencyCondition: The result of any execution is the

same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in the order specified by its program, and in addition

if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence.


© DoCS 2002 9


Causal ConsistencyCondition: If event B is caused or influenced

by event A then all processes see A before Bo Writes that are potentially causally related

must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.


Causal Consistency


© DoCS 2002 10


Causal Consistency

(a) is not causal consistent but (b) is

(a)

(b)


FIFO ConsistencyNecessary Condition: Writes done by a single

process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.Messages propagated in orderNo causal consistency.

Pipelined RAM (PRAM) Consistency


© DoCS 2002 11


FIFO Consistency


FIFO Consistency

y = 1;print (x, z);z = 1;print (x, y);x = 1;print (y, z);

Prints: 01

x = 1;y = 1;print(x, z);print ( y, z);z = 1;print (x, y);

Prints: 10

x = 1;print (y, z);y = 1;print(x, z);z = 1;print (x, y);

Prints: 00

z = 1;print (x, y);

y = 1;print (x, z);

x = 1;print ( y, z);

Process P3Process P2Process P1


© DoCS 2002 12


FIFO Consistency

Two concurrent processes.

y = 1;if (x == 0) kill (P1);

x = 1;if (y == 0) kill (P2);

Process P2Process P1


Weak ConsistencyBased on critical section modelAssumes locking is done separatelySets of operations can be performed on local copy within the critical section.

TransactionManager

Scheduler

DataManager


© DoCS 2002 13


Weak ConsistencyUses synchronization variableo Accesses to synchronization variables are

sequentially consistento No operation on a synchronization variable is

allowed to be performed until all previous writes have completed everywhere

o No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.


Weak ConsistencyValid sequence

Invalid sequence


© DoCS 2002 14


Weak Consistency Properties:Accesses to synchronization variables associated with a data store are sequentially consistentNo operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhereNo read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.


Weak Consistency

A program fragment in which some variables may be kept in registers.

int a, b, c, d, e, x, y; /* variables */int *p, *q; /* pointers */int f( int *p, int *q); /* function prototype */

a = x * x; /* a stored in register */b = y * y; /* b as well */c = a*a*a + b*b + a * b; /* used later */d = a * a * c; /* used later */p = &a; /* p gets address of a */q = &b /* q gets address of b */e = f(p, q) /* function call */


© DoCS 2002 15


Release Consistency (1)

A valid event sequence for release consistency.


Release Consistency (2)

Rules:Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully.Before a release is allowed to be performed, all previous reads and writes by the process must have completedAccesses to synchronization variables are FIFO consistent (sequential consistency is not required).


© DoCS 2002 16


Entry Consistency

Conditions:An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.Before an exclusive mode access to a synchronization variable bya process is allowed to perform with respect to that process, noother process may hold the synchronization variable, not even innonexclusive mode.After an exclusive mode access to a synchronization variable hasbeen performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner.


Entry Consistency

A valid event sequence for entry consistency.


© DoCS 2002 17


Summarya) Consistency models not using synchronization operations.b) Models with synchronization operations.

(b)

Shared data pertaining to a critical region are made consistent when a critical region is entered.Entry

Shared data are made consistent when a critical region is exitedRelease

Shared data can be counted on to be consistent only after a synchronization is doneWeak

DescriptionConsistency

(a)

All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that orderFIFO

All processes see causally-related shared accesses in the same order.Causal

All processes see all shared accesses in the same order. Accesses are not ordered in timeSequential

All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestampLinearizability

Absolute time ordering of all shared accesses matters.Strict

DescriptionConsistency


Client-Centric ConsistencyGoal: Show how we can perhaps avoid system-wide

consistency, by concentrating on what specific clients want, instead of what should be maintained by servers.

Background: Most large-scale distributed systems (i.e., databases) apply replication for scalability, but can support only weak consistency:o DNS: Updates are propagated slowly, and inserts may

not be immediately visible.o NEWS: Articles and reactions are pushed and pulled

throughout the Internet, such that reactions can be seen before postings.

o WWW: Caches all over the place, but there need be no guarantee that you are reading the most recent version of a page.


© DoCS 2002 18


Eventual Consistency


Client-Centric Models (1)Monotonic Readso If a process reads the value of a data item x,

any successive reads on x by that process will always return the same value of x or a more recent value.

Monotonic Writeso A write operation by a process on a data item x

is completed before any successive write operation on x by the same process.


© DoCS 2002 19


Monotonic ReadsExample: Automatically reading your personal calendar updates from different servers. Monotonic Reads guarantees that the user sees all updates, no matter from which server the automatic reading takes place.Example: Reading (not modifying) incoming mail while you are on the move. Each time you connect to a different e-mail server, that server fetches (at least) all the updates from the server you previously visited.


Monotonic Reads

The read operations performed by a single process P at two different local copies of the same data store.

a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.

Notation: WS(xi) is the set of write operations (at Li) that lead to version xiof x (at time t); WS(xi,xj) indicates that it is known that WS(xi) is part of WS(xj)


© DoCS 2002 20


Monotonic WritesExample: Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2.Example: Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).


Monotonic Writes

The write operations performed by a single process P at two different local copies of the same data store

a) A monotonic-write consistent data store.b) A data store that does not provide monotonic-write consistency.


© DoCS 2002 21


Client-Centric Models (2)Read your writeso The effect of a write operation by a process on

data item x will always be seen by a successive read operation on x by the same process.

Writes follow readso A write operation by a process on a data item x

following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.


Read Your WritesExample: Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.


© DoCS 2002 22


Read Your Writes

a) A data store that provides read-your-writes consistency.b) A data store that does not.


Writes Follow ReadsExample: See reactions to posted articles only if you have the original posting (a read “pulls in” the corresponding write operation).


© DoCS 2002 23


Writes Follow Reads

a) A writes-follow-reads consistent data storeb) A data store that does not provide writes-follow-reads

consistency


ImplementationUnique IDs on writeso Each write is uniquely identifiedo Identification includes replicao Process keeps read and write sets of IDs

Timestampso V(i)[j] equals timestamp of latest write

initiated on server j to be propagated toserver i.

o Leads to efficient representation of read and write sets.


© DoCS 2002 24


Distribution Protocols

Placement and nature of replicasDistributing updatesAchieving consistency

client


ReplicasPermanento Load sharingo Mirroring

Server-Initiatedo Push cacheso Temporary hosts- NOMAD

Client-Initiatedo Caches


© DoCS 2002 25


Replica Placement

The logical organization of different kinds of copies of a data store into three concentric rings.


Permanent ReplicasThe files that constitute a site are replicated across a limited number of servers on a single LANA Web site is copied to a limited number of servers, called mirrored sites, which are geographically spread across the Internet.


© DoCS 2002 26


Server-Initiated ReplicasCounting access requests from different

clients.1. Keep track of access counts per

file, aggregated by considering server closest to requesting clients

2. Number of accesses drops below threshold D ⇒ drop file

3. Number of accesses exceeds threshold R ⇒ replicate file

4. Number of access between D and R ⇒ migrate file


Client-Initiated ReplicasClient cacheso Improve access timeo Located at client machineo Consistency left to cliento Cache hito Caches can be shared between clients


© DoCS 2002 27


Update PropagationState or Operation?o Notification of updateo New copy of datao Copy of operation

Trade bandwidth for processing

Push or Pull?Push by servero Server must know

replicaso Client immediately

updatedPull by cliento Client must poll oro Delay response when

item requestedLeases


Pull versus Push Protocols

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Fetch-update timeImmediate (or fetch-update time)Response time at client

Poll and updateUpdate (and possibly fetch update later)Messages sent

NoneList of client replicas and cachesState of server

Pull-basedPush-basedIssue


© DoCS 2002 28


Epidemic ProtocolsReplicas are:o Infectiveo Susceptibleo Removed

Update is pushedo May not infect all

replicaso Augment with a pull

If push to infected replica stop with probability 1/k.


PrinciplesBasic idea: Assume there are no write–write

conflicts:Update operations are initially performed at one or only a few replicasA replica passes its updated state to a limited number of neighborsUpdate propagation is lazy, i.e., not immediateEventually, each update should reach every replica


© DoCS 2002 29


PrinciplesAnti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwardsGossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).


Consistency ProtocolsPrimary Basedo Remote Writeo Local Write

Replicated Writeo Active Replicationo Quorum Based Votingo Cache-Coherence


© DoCS 2002 30


Remote-Write Protocols (1)

Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.


Remote-Write Protocols (2)

The principle of primary-backup protocol.


© DoCS 2002 31


Local-Write Protocols (1)

Primary-based local-write protocol in which a single copy is migrated between processes.


Local-Write Protocol (2)


© DoCS 2002 32


Active Replication (1)The problem of replicated invocations.

Operation forwarded to each replicaOperations must be sequencedUse multicast orsequencer


Active Replication (2)

a) Forwarding an invocation request from a replicated object.b) Returning a reply to a replicated object.


© DoCS 2002 33


Quorum Based Protocols

Valid read/write setso NR + NW > No NW > N/2

InvalidRead-One Write-ALL


Orca

A simplified stack object in Orca, with internal data and two operations.

OBJECT IMPLEMENTATION stack;top: integer; # variable indicating the topstack: ARRAY[integer 0..N-1] OF integer # storage for the stack

OPERATION push (item: integer) # function returning nothingBEGIN

GUARD top < N DOstack [top] := item; # push item onto the stacktop := top + 1; # increment the stack pointer

OD;END;

OPERATION pop():integer; # function returning an integerBEGIN

GUARD top > 0 DO # suspend if the stack is emptytop := top – 1; # decrement the stack pointer

RETURN stack [top]; # return the top itemOD;

END;BEGIN

top := 0; # initializationEND;


© DoCS 2002 34


Management of Shared Objects in Orca

Four cases of a process P performing an operation on an object O in Orca.


Causally-Consistent Lazy Replication

The general organization of a distributed data store. Clients are assumed to also handle consistency-related communication.


© DoCS 2002 35


Processing Read OperationsPerforming a read operation at a local copy.


Processing Write Operations

Performing a write operation at a local copy.

replication and consistency - uppsala...

Documents