replication and consistency - uppsala...
TRANSCRIPT
ReplicationAndConsistency September 2002
© DoCS 2002 1
Distributed Systems
Replication and Consistency
September 2002
September 2002 © DoCS 2
Introduction to replication (1)Replication can provide the following
Performance enhancemento e.g. several web servers can have the same DNS name and the
servers are selected in turn. To share the load.o replication of read-only data is simple, but replication of
changing data has overheads
Fault-tolerant serviceo guarantees correct behaviour in spite of certain faults (can
include timeliness)o if f of f + 1 servers crash then 1 remains to supply the serviceo if f of 2f + 1 servers have byzantine faults then they can supply
a correct service
ReplicationAndConsistency September 2002
© DoCS 2002 2
September 2002 © DoCS 3
Introduction to replication (2)
Availability is hindered byo server failures
replicate data at failure- independent servers and when one fails, client may use another. Note that caches do not help with availability (they are incomplete).
o network partitions and disconnected operationUsers of mobile computers deliberately disconnect, and then on re-connection, resolve conflicts
September 2002 © DoCS 4
Requirements for replicated data
Replication transparencyo clients see logical objects (not several physical copies)
they access one logical item and receive a single resultConsistencyo specified to suit the application,
e.g. when a user of a diary disconnects, their local copy may beinconsistent with the others and will need to be reconciled when they connect again. But connected clients using different copies should get consistent results.
What is replication transparency?
ReplicationAndConsistency September 2002
© DoCS 2002 3
September 2002 © DoCS 5
System modelEach logical object is implemented by a collection of physical copies called replicaso the replicas are not necessarily consistent all the time (some may
have received updates, not yet conveyed to the others)We assume an asynchronous system where processes fail only by crashing and generally assume no network partitionsReplica managers o an RM contains replicas on a computer and access them directlyo RMs apply operations to replicas recoverably
i.e. they do not leave inconsistent results if they crasho objects are copied at all RMs unless we state otherwiseo static systems are based on a fixed set of RMso in a dynamic system: RMs may join or leave (e.g. when they crash)o an RM can be a state machine.
September 2002 © DoCS 6
ReplicationImprove availabilityImprove performanceo Scaleo Location
Cost is consistency“Informal replication” - caching“Formal replication” - managed
ReplicationAndConsistency September 2002
© DoCS 2002 4
September 2002 © DoCS 7
Managing ConcurrencyObjects must manage concurrent invocationsReplicas states must be consistentConcurrent invocations on replicas should have same behavior as on single object.
client client
client client
object
replica replica
1 3
2
September 2002 © DoCS 8
Object Replication
A remote object capable of handling concurrent invocations on its own.A remote object for which an object adapter is required to handle concurrent invocations
Object managed “System” managed
ReplicationAndConsistency September 2002
© DoCS 2002 5
September 2002 © DoCS 9
Object Replication
a) A distributed system for replication-aware distributed objects.b) A distributed system responsible for replica management
September 2002 © DoCS 10
Consistency Models
Data-Centric Consistency ModelsClient-Centric Consistency Models
ReplicationAndConsistency September 2002
© DoCS 2002 6
September 2002 © DoCS 11
Consistency Models (Data Centric)Strict consistencySequential consistencyLinearizabilityCausal consistencyFIFO consistencyWeak consistency
September 2002 © DoCS 12
Strict ConsistencyCondition: Any read of X returns the value of
the most recent write of X.Depends on a global clock.
strict consistency not strict consistency
Wi(x)a - process i writes value a to variable xRi(x)b - process i gets value b by reading xtsOP(x) - the OP (read, write) was performed on x at this time
ReplicationAndConsistency September 2002
© DoCS 2002 7
September 2002 © DoCS 13
Sequential ConsistencyCondition: The result of any execution is the
same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the Operations of each individual process appear in this sequence in the order specified by its program.
September 2002 © DoCS 14
Sequential Consistency
Sequentially Consistent Not Sequentially Consistent
ReplicationAndConsistency September 2002
© DoCS 2002 8
September 2002 © DoCS 15
Sequential Consistency
z = 1;print (x, y);
y = 1;print (x, z);
x = 1;print ( y, z);
Process P3Process P2Process P1
Four valid executions
September 2002 © DoCS 16
Linearizable ConsistencyCondition: The result of any execution is the
same as if the (read and write) operations by all processes on the data store were executed in some sequential order, and the operations of each individual process appear in the order specified by its program, and in addition
if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence.
ReplicationAndConsistency September 2002
© DoCS 2002 9
September 2002 © DoCS 17
Causal ConsistencyCondition: If event B is caused or influenced
by event A then all processes see A before Bo Writes that are potentially causally related
must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
September 2002 © DoCS 18
Causal Consistency
ReplicationAndConsistency September 2002
© DoCS 2002 10
September 2002 © DoCS 19
Causal Consistency
(a) is not causal consistent but (b) is
(a)
(b)
September 2002 © DoCS 20
FIFO ConsistencyNecessary Condition: Writes done by a single
process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.Messages propagated in orderNo causal consistency.
Pipelined RAM (PRAM) Consistency
ReplicationAndConsistency September 2002
© DoCS 2002 11
September 2002 © DoCS 21
FIFO Consistency
September 2002 © DoCS 22
FIFO Consistency
y = 1;print (x, z);z = 1;print (x, y);x = 1;print (y, z);
Prints: 01
x = 1;y = 1;print(x, z);print ( y, z);z = 1;print (x, y);
Prints: 10
x = 1;print (y, z);y = 1;print(x, z);z = 1;print (x, y);
Prints: 00
z = 1;print (x, y);
y = 1;print (x, z);
x = 1;print ( y, z);
Process P3Process P2Process P1
ReplicationAndConsistency September 2002
© DoCS 2002 12
September 2002 © DoCS 23
FIFO Consistency
Two concurrent processes.
y = 1;if (x == 0) kill (P1);
x = 1;if (y == 0) kill (P2);
Process P2Process P1
September 2002 © DoCS 24
Weak ConsistencyBased on critical section modelAssumes locking is done separatelySets of operations can be performed on local copy within the critical section.
TransactionManager
Scheduler
DataManager
ReplicationAndConsistency September 2002
© DoCS 2002 13
September 2002 © DoCS 25
Weak ConsistencyUses synchronization variableo Accesses to synchronization variables are
sequentially consistento No operation on a synchronization variable is
allowed to be performed until all previous writes have completed everywhere
o No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.
September 2002 © DoCS 26
Weak ConsistencyValid sequence
Invalid sequence
ReplicationAndConsistency September 2002
© DoCS 2002 14
September 2002 © DoCS 27
Weak Consistency Properties:Accesses to synchronization variables associated with a data store are sequentially consistentNo operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhereNo read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.
September 2002 © DoCS 28
Weak Consistency
A program fragment in which some variables may be kept in registers.
int a, b, c, d, e, x, y; /* variables */int *p, *q; /* pointers */int f( int *p, int *q); /* function prototype */
a = x * x; /* a stored in register */b = y * y; /* b as well */c = a*a*a + b*b + a * b; /* used later */d = a * a * c; /* used later */p = &a; /* p gets address of a */q = &b /* q gets address of b */e = f(p, q) /* function call */
ReplicationAndConsistency September 2002
© DoCS 2002 15
September 2002 © DoCS 29
Release Consistency (1)
A valid event sequence for release consistency.
September 2002 © DoCS 30
Release Consistency (2)
Rules:Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully.Before a release is allowed to be performed, all previous reads and writes by the process must have completedAccesses to synchronization variables are FIFO consistent (sequential consistency is not required).
ReplicationAndConsistency September 2002
© DoCS 2002 16
September 2002 © DoCS 31
Entry Consistency
Conditions:An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.Before an exclusive mode access to a synchronization variable bya process is allowed to perform with respect to that process, noother process may hold the synchronization variable, not even innonexclusive mode.After an exclusive mode access to a synchronization variable hasbeen performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner.
September 2002 © DoCS 32
Entry Consistency
A valid event sequence for entry consistency.
ReplicationAndConsistency September 2002
© DoCS 2002 17
September 2002 © DoCS 33
Summarya) Consistency models not using synchronization operations.b) Models with synchronization operations.
(b)
Shared data pertaining to a critical region are made consistent when a critical region is entered.Entry
Shared data are made consistent when a critical region is exitedRelease
Shared data can be counted on to be consistent only after a synchronization is doneWeak
DescriptionConsistency
(a)
All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that orderFIFO
All processes see causally-related shared accesses in the same order.Causal
All processes see all shared accesses in the same order. Accesses are not ordered in timeSequential
All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestampLinearizability
Absolute time ordering of all shared accesses matters.Strict
DescriptionConsistency
September 2002 © DoCS 34
Client-Centric ConsistencyGoal: Show how we can perhaps avoid system-wide
consistency, by concentrating on what specific clients want, instead of what should be maintained by servers.
Background: Most large-scale distributed systems (i.e., databases) apply replication for scalability, but can support only weak consistency:o DNS: Updates are propagated slowly, and inserts may
not be immediately visible.o NEWS: Articles and reactions are pushed and pulled
throughout the Internet, such that reactions can be seen before postings.
o WWW: Caches all over the place, but there need be no guarantee that you are reading the most recent version of a page.
ReplicationAndConsistency September 2002
© DoCS 2002 18
September 2002 © DoCS 35
Eventual Consistency
September 2002 © DoCS 36
Client-Centric Models (1)Monotonic Readso If a process reads the value of a data item x,
any successive reads on x by that process will always return the same value of x or a more recent value.
Monotonic Writeso A write operation by a process on a data item x
is completed before any successive write operation on x by the same process.
ReplicationAndConsistency September 2002
© DoCS 2002 19
September 2002 © DoCS 37
Monotonic ReadsExample: Automatically reading your personal calendar updates from different servers. Monotonic Reads guarantees that the user sees all updates, no matter from which server the automatic reading takes place.Example: Reading (not modifying) incoming mail while you are on the move. Each time you connect to a different e-mail server, that server fetches (at least) all the updates from the server you previously visited.
September 2002 © DoCS 38
Monotonic Reads
The read operations performed by a single process P at two different local copies of the same data store.
a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.
Notation: WS(xi) is the set of write operations (at Li) that lead to version xiof x (at time t); WS(xi,xj) indicates that it is known that WS(xi) is part of WS(xj)
ReplicationAndConsistency September 2002
© DoCS 2002 20
September 2002 © DoCS 39
Monotonic WritesExample: Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2.Example: Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).
September 2002 © DoCS 40
Monotonic Writes
The write operations performed by a single process P at two different local copies of the same data store
a) A monotonic-write consistent data store.b) A data store that does not provide monotonic-write consistency.
ReplicationAndConsistency September 2002
© DoCS 2002 21
September 2002 © DoCS 41
Client-Centric Models (2)Read your writeso The effect of a write operation by a process on
data item x will always be seen by a successive read operation on x by the same process.
Writes follow readso A write operation by a process on a data item x
following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.
September 2002 © DoCS 42
Read Your WritesExample: Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.
ReplicationAndConsistency September 2002
© DoCS 2002 22
September 2002 © DoCS 43
Read Your Writes
a) A data store that provides read-your-writes consistency.b) A data store that does not.
September 2002 © DoCS 44
Writes Follow ReadsExample: See reactions to posted articles only if you have the original posting (a read “pulls in” the corresponding write operation).
ReplicationAndConsistency September 2002
© DoCS 2002 23
September 2002 © DoCS 45
Writes Follow Reads
a) A writes-follow-reads consistent data storeb) A data store that does not provide writes-follow-reads
consistency
September 2002 © DoCS 46
ImplementationUnique IDs on writeso Each write is uniquely identifiedo Identification includes replicao Process keeps read and write sets of IDs
Timestampso V(i)[j] equals timestamp of latest write
initiated on server j to be propagated toserver i.
o Leads to efficient representation of read and write sets.
ReplicationAndConsistency September 2002
© DoCS 2002 24
September 2002 © DoCS 47
Distribution Protocols
Placement and nature of replicasDistributing updatesAchieving consistency
client
September 2002 © DoCS 48
ReplicasPermanento Load sharingo Mirroring
Server-Initiatedo Push cacheso Temporary hosts- NOMAD
Client-Initiatedo Caches
ReplicationAndConsistency September 2002
© DoCS 2002 25
September 2002 © DoCS 49
Replica Placement
The logical organization of different kinds of copies of a data store into three concentric rings.
September 2002 © DoCS 50
Permanent ReplicasThe files that constitute a site are replicated across a limited number of servers on a single LANA Web site is copied to a limited number of servers, called mirrored sites, which are geographically spread across the Internet.
ReplicationAndConsistency September 2002
© DoCS 2002 26
September 2002 © DoCS 51
Server-Initiated ReplicasCounting access requests from different
clients.1. Keep track of access counts per
file, aggregated by considering server closest to requesting clients
2. Number of accesses drops below threshold D ⇒ drop file
3. Number of accesses exceeds threshold R ⇒ replicate file
4. Number of access between D and R ⇒ migrate file
September 2002 © DoCS 52
Client-Initiated ReplicasClient cacheso Improve access timeo Located at client machineo Consistency left to cliento Cache hito Caches can be shared between clients
ReplicationAndConsistency September 2002
© DoCS 2002 27
September 2002 © DoCS 53
Update PropagationState or Operation?o Notification of updateo New copy of datao Copy of operation
Trade bandwidth for processing
Push or Pull?Push by servero Server must know
replicaso Client immediately
updatedPull by cliento Client must poll oro Delay response when
item requestedLeases
September 2002 © DoCS 54
Pull versus Push Protocols
A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.
Fetch-update timeImmediate (or fetch-update time)Response time at client
Poll and updateUpdate (and possibly fetch update later)Messages sent
NoneList of client replicas and cachesState of server
Pull-basedPush-basedIssue
ReplicationAndConsistency September 2002
© DoCS 2002 28
September 2002 © DoCS 55
Epidemic ProtocolsReplicas are:o Infectiveo Susceptibleo Removed
Update is pushedo May not infect all
replicaso Augment with a pull
If push to infected replica stop with probability 1/k.
September 2002 © DoCS 56
PrinciplesBasic idea: Assume there are no write–write
conflicts:Update operations are initially performed at one or only a few replicasA replica passes its updated state to a limited number of neighborsUpdate propagation is lazy, i.e., not immediateEventually, each update should reach every replica
ReplicationAndConsistency September 2002
© DoCS 2002 29
September 2002 © DoCS 57
PrinciplesAnti-entropy: Each replica regularly chooses another replica at random, and exchanges state differences, leading to identical states at both afterwardsGossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).
September 2002 © DoCS 58
Consistency ProtocolsPrimary Basedo Remote Writeo Local Write
Replicated Writeo Active Replicationo Quorum Based Votingo Cache-Coherence
ReplicationAndConsistency September 2002
© DoCS 2002 30
September 2002 © DoCS 59
Remote-Write Protocols (1)
Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.
September 2002 © DoCS 60
Remote-Write Protocols (2)
The principle of primary-backup protocol.
ReplicationAndConsistency September 2002
© DoCS 2002 31
September 2002 © DoCS 61
Local-Write Protocols (1)
Primary-based local-write protocol in which a single copy is migrated between processes.
September 2002 © DoCS 62
Local-Write Protocol (2)
ReplicationAndConsistency September 2002
© DoCS 2002 32
September 2002 © DoCS 63
Active Replication (1)The problem of replicated invocations.
Operation forwarded to each replicaOperations must be sequencedUse multicast orsequencer
September 2002 © DoCS 64
Active Replication (2)
a) Forwarding an invocation request from a replicated object.b) Returning a reply to a replicated object.
ReplicationAndConsistency September 2002
© DoCS 2002 33
September 2002 © DoCS 65
Quorum Based Protocols
Valid read/write setso NR + NW > No NW > N/2
InvalidRead-One Write-ALL
September 2002 © DoCS 66
Orca
A simplified stack object in Orca, with internal data and two operations.
OBJECT IMPLEMENTATION stack;top: integer; # variable indicating the topstack: ARRAY[integer 0..N-1] OF integer # storage for the stack
OPERATION push (item: integer) # function returning nothingBEGIN
GUARD top < N DOstack [top] := item; # push item onto the stacktop := top + 1; # increment the stack pointer
OD;END;
OPERATION pop():integer; # function returning an integerBEGIN
GUARD top > 0 DO # suspend if the stack is emptytop := top – 1; # decrement the stack pointer
RETURN stack [top]; # return the top itemOD;
END;BEGIN
top := 0; # initializationEND;
ReplicationAndConsistency September 2002
© DoCS 2002 34
September 2002 © DoCS 67
Management of Shared Objects in Orca
Four cases of a process P performing an operation on an object O in Orca.
September 2002 © DoCS 68
Causally-Consistent Lazy Replication
The general organization of a distributed data store. Clients are assumed to also handle consistency-related communication.
ReplicationAndConsistency September 2002
© DoCS 2002 35
September 2002 © DoCS 69
Processing Read OperationsPerforming a read operation at a local copy.
September 2002 © DoCS 70
Processing Write Operations
Performing a write operation at a local copy.