chapter 7 consistency and replication (consistency protocols) anuja p parameshwaran instructor: dr...

Chapter 7Consistency And

Replication (CONSISTENCY PROTOCOLS)

Anuja P Parameshwaran

Instructor: Dr Yanqing Zhang

Advanced Operating System

2

Overview:Continuous Consistency

Bounding Numerical DeviationsBounding Staleness DeviationsBounding Ordering Deviations

Sequential ConsistencyPrimary-Based Protocols

◦ Remote-Write Protocols◦ Local-Write Protocols

Replicated-Write Protocols◦ Active Replication◦ Voting (Quorum Based Protocols)

Cache-Coherence Protocols

Implementing Client-Centric Consistency

Summary

3

Recap What is a Consistency Model? -Used in DS like distributed memory systems or distributed data stores.

-The system supports a given model if operations on memory follow specific rules.

-Usually define rules for apparent order and visibility of updates

What is a Consistency Protocol? - Describes implementation of specific Consistency Model.

4

Continuous Consistency: Developed by “Yu and Vahdat” in-order to tackle 3 forms of consistency:

Bounding Numerical Deviation:

A solution for keeping the numerical deviation within bounds

Concentrate on writes to a single data item x

Each write W(x) has an associated weight that represents the numerical value by which x is updated weight (W) [Assume weight (W) > 0]

5

Continuous consistency: Numerical deviation

W is initially forwarded to one of the N replicas, denoted as origin(W).TW[i,j] writes executed by server Si that originated from SjTW[i,j] = ∑{weight(W) | origin(W) = Sj & W Є Li}The goal is for any time t, to let the current value Vi at server Si deviate within bounds from the actual value v(t) of x.

This actual value is completely determined by all submitted writes Actual value : N Value Vi of x at replica I : N v(t) = v(0) + ∑ TW[k,k] v(i) = v(0) + ∑ TW[i,k] k=1 k=1

6

Continuous consistency: Numerical deviation

For every server Si, associate an upperbound δi such that we need to enforce:

v(t) – vi ≤ δi; for every server Si

Sk maintains a view TWk [i,j] of what it believes Si will have as value for TW[i,j].

SolutionSĸ sends operations from its log to Si when it sees that TWĸ[i,k] is getting too far from TW[k,k], in particular, when TW[k,k] - TWĸ[i,k] > δi / (N -1)

7

Bounding Staleness Deviations:There are many ways to keep the staleness of replicas within specified boundsOne approach timestamp each submitted write by its origin serverLet server Sk keep a real-time vector clock RVCk where:RVCk[i] = T(i) means that Sk has seen all writes that have been submitted to Si

up to time T(i) T(i) denotes the time local to Si

8

Bounding Staleness Deviations:If the clocks between the replica servers are loosely synchronized:Whenever server Sk notes that T(k) - RVCk[i] is about to exceed a specified limit It simply starts pulling in writes that originated from Si with a timestamp later than RVCk[i]·A replica server is responsible for keeping its copy of x up to date regarding writes that have been issued elsewhere

9

Numeric VS Staleness Bounds:•Numerical bounds follows a push approach, by letting an origin server keep replicas up to date by forwarding writes

•Staleness Bounds follows a Pull approach, Replica servers pull in updates from origin servers

10

Bounding Ordering DeviationsReplica server tentatively applies updates that have been submitted to it.

Each server will have a local queue of tentative writes [order yet to be determined]

The ordering deviation is bounded by specifying the maximal length of the queue of tentative writes

Enforce a globally consistent ordering of tentative writes using primary-based or quorum-based protocols.

11

Primary-Based Protocols

Provides a straightforward implementation of sequential consistencyAll processes see all write operations in the same order, no matter which backup server they use to perform read operationsA distinction can be made as to whether:

- The primary is fixed at a remote server

- Write operations can be carried out locally after moving the primary to the process where the write operation is initiated

x Primary

12

Remote- Write ProtocolsAll write operations need to be forwarded to a fixed single server Read operations can be carried out locallyAlso called (primary-backup protocol)Disadvantage: It may take a relatively long time before the process that initiated the update is allowed to continue, an update is implemented as a blocking operationAlternative: Non-blocking approach

13

Remote-Write Protocols

14

Remote- Write ProtocolsNonblocking approach:As soon as the primary has updated its local copy of x, it returns an acknowledgment. After that, it tells the backup servers to perform the update as wellAdvantage: write operations may speed up considerablyDisadvantage: fault tolerance, updates may not be backed up by other servers

15

Local- Write ProtocolsWhen a process wants to update data item x, it locates the primary copy of x, and subsequently moves it to its own location Advantage (in non-blocking protocol only):Multiple, successive write operations can be carried out locally, while reading processes can still access their local copyUpdates are propagated to the replicas after the primary has finished with locally performing the updates

16

Local-Write Protocols

17

Local- Write ProtocolsExample primary-backup protocol with local writesDistributed file systems that require a high degree of fault tolerance

Fixed Central Server

Replica Server

Replica Server

Replica Server

1) Propagate writes to speed up performance

2) Updates propagated

3) Distribute updates to other replica servers

18

Replicated-Write Protocols

Write operations can be carried out at multiple replicas instead of only oneA distinction can be made between: - Active replication, in which an operation is forwarded to all

replicas - Majority voting (Quorum-Based Protocols)

19

Active Replication

Each replica has an associated process that carries out update operations.[operation sent to each replica]Problem: Operations need to be carried out in the same order everywhere Totally-ordered multicast mechanism. Such a multicast can be implemented using Lamport's logical clocks,Disadvantage: This implementation of multicasting does not scale well in large distributed systems.Alternative: Total ordering can be achieved using a central coordinator, also called a sequencer.

20

Active Replication1. Forward each operation to the sequencer, which assigns it a

unique sequence number and subsequently forwards the operation to all replicas

2. Operations are carried out in the order of their sequence number.

SEQUENCER/CENTRAL CO-ORDINATOR

1.1 1.1REPLICAS REPLICAS

1.21.2

21

Quorum-Based Protocols

A different approach to supporting replicated writes is to use votingClients to request and acquire the permission of multiple servers before either reading or writing a replicated data item

How the algorithm works?

1. A file is replicated on N servers

2. To update a file: A client must first contact at least half the servers plus one (a majority) and get them to agree to do the update.

3. Once they have agreed, the file is changed and a new version number is associated with the new file.

22

Quorum-Based ProtocolsTo read a replicated file?1. A client must also contact at least half the servers plus one and

ask them to send the version numbers associated with the file.

2. If all the version numbers are the same, this must be the most recent version

5 servers3 servers (V-8)

2 servers (V-9)

23

Quorum-Based ProtocolsTo read a file of which N replicas exist: A client needs to assemble a read quorum, an arbitrary collection of any NR servers, or more To modify a file: A write quorum of at least Nw servers is required. The values of NR and Nw are subject to the following two constraints:

NR +NW > N [To prevent read-write conflicts]

and

NW > N / 2 [To prevent write-write conflicts]

24

Quorum-Based Protocols

Figure: Three examples of the voting algorithm. (a) A correct choice of read and write set. (b) A choice that may lead to write-write conflicts. (c) A correct choice, known as ROWA (read one, write all).

25

Cache-Coherence Protocols Caching solutions may differ in:coherence detection strategy (when)coherence enforcement strategy (How)

Dynamic solutions: inconsistencies are detected at runtime.

For example: A check is made with the server to see whether the cached data have been modified since they were cached

26


Naive Implementation:

Each write operation W is assigned a globally unique identifier. Such an identifier is assigned by the server to which the write had been submitted (Origin of W).

For each client, a track of two sets of writes:• The read set for a client consists of the writes relevant for the read operations

performed by a client• The write set consists of the identifiers of the writes performed by the client.

27


Monotonic-read consistency implementation:When a client performs a read operation at a server, that server is handed the client's read set to check whether all the identified writes have taken place locally (The size of the set may introduce a performance problem)If not, it contacts the other servers to ensure that it is brought up to date before carrying out the read operationAlternatively, the read operation is forwarded to a server where the write operations have already taken placeAfter the read operation is performed, the write operations that have taken place at the selected server and which are relevant for the read operation are added to the client‘s read set

28


Monotonic-write consistency implementation:When a client initiates a new write operation at a server, the server is handed over the client's write setIt then ensures that the identified write operations are performed first and in the correct orderAfter performing the new operation, that operation's write identifier is added to the write set

29


Read-your-writes consistency :Requires that the server where the read operation is performed has seen all the write operations in the client's write setThe writes can be fetched from other servers before the read operation is performed

Problem: Could lead to poor response time

30


Writes-follow-reads consistency:

Implemented by first bringing the selected server up to date with the write operations in the client's read set

Then, later adding the identifier of the write operation to the write set, along with the identifiers in the read set

31

Improving EfficiencyRead set and Write set associated with each client can become very large

Solution:A client's read and write operations are grouped into sessionsA session is typically associated with an application: it is opened when the application starts and is closed when it exitsWhenever a client closes a session, the sets are cleared

32

SummaryConsistency protocols describe specific implementations of consistency modelsA distinction can be made between primary-based protocols and replicated-write protocolsIn primary-based protocols, all update operations are forwarded to a primary copy that subsequently ensures the update is properly ordered and forwardedIn replicated-write protocols, an update is forwarded to several replicas at the same time

33

Present… Hardware-Based Protocols:1. Snoopy protocol

2. Directory based protocol

3. Hybrid cache coherence protocol

4. Lock based protocol

Software-Based Protocols:1. MSI protocol

2. MESI protocol

3. MOSI protocol

4. MOESI protocol

5. Dragon protocol

34

Cache Coherence for GPU Architectures:GPU cache coherency is difficult and challenging

Conventional cache coherency algorithms are not well suited for GPUs

The management of transient state for thousands of in-flight memory accesses adds hardware and complexity overhead

Coherence adds unnecessary traffic overheads to existing GPU applications.

Temporal Coherence, a timestamp based coherence framework that reduces overheads of GPU coherence

35

References:1. Stenström, Per. "A survey of cache coherence schemes for multiprocessors."Computer 23,

no. 6 (1990): 12-24.

2. Singh, Inderjit, Arrvindh Shriraman, Wilson WL Fung, Mike O'Connor, and Tor M. Aamodt. "Cache coherence for GPU architectures." In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on, pp. 578-590. IEEE, 2013.

3. Tanenbaum, Andrew S., and Maarten Van Steen. Distributed systems. Prentice-Hall, 2007.

chapter 7 consistency and replication (consistency protocols) anuja p parameshwaran instructor: dr...

Documents

server si sk

continuous consistency

numerical value

server si deviate

writes staleness bounds

numerical bounds

forms of consistency

n value