1 6.4 distribution protocols different ways of propagating/distributing updates to replicas,...

1

6.4 Distribution Protocols

• Different ways of propagating/distributing updates to replicas, independent of the consistency model.

• First design issue for distributing data stores: deciding where, when, and by whom copies of the data store are to be placed.

2

6.4.1 Replica Placement

The logical organization of different kinds of copies of a data store into three concentric rings.

3

Permanent Replicas

• Use web sites as an example:– Files replicated across a limited number of servers on

a single local-area network– Mirroring to mirror sites geographically spread across

the internet

• Distributed database– Database could be distributed and replicated across a

cluster of workstations, where neither disks nor main memory are shared by processors

– Database could be distributed, possibly replicated, across a number of geographically dispersed number of sites.

4

Server-Initiated Replicas

• Definition: copies of a data store that are to enhance performance and are created at the initiative of (the owner of) the data store.

• example: web hosting

• Problem: deciding when and where replicas should be created or deleted.

• Web hosting algorithm (Robinovich):• two issues:

• replication can take place to reduce the load on a server• specific files on a server can be migrated or replicated to servers in the proximity of requesting clients

5

Server-Initiated Replicas

Counting access requests from different clients• When the number of requests for a specific file F at server S drops below a deletion threshold, F can be removed from S. Must ensure at least one copy of each file continues to exist.• When the number of requests for a specific file F at server S is over a replication threshold, F can be replicated in a server with many requests.• If the number of requests is between the above two thresholds, F can only be migrated. The chosen server is the one with more than half of the total requests.• Used mostly for read-only copies close to clients, whereas permanent replicas are used for backup or as the only updateable replica to guarantee consistency.

6

Client-Initiated Replicas• Also known as caches. • In principle, managing the cache is left entirely to the

client. However, client may rely on the data store to inform it when cached data has become stale.

• Caches are generally kept for a limited amount of time to prevent using stale data, or to make room for other data.

• To improve the number of cache hits, caches can be shared between clients.

• Placement of client caches is simple: – on the same as the the client– on a machine shared by clients on the same local area network– extra levels of caching may be introduced

7

6.4.2 Update Propagation• State versus Operations

– What is actually to be propagated:• Propagate only a notification of an update: what invalidation

protocols do. When an operation on an invalidated copy is requested, that copy needs to be updated first, depending on the specific supported consistency model.

– Use little bandwidth.

– Best when the read-to-write ratio is low.

• Transfer data from one copy to another.– Used when the read-to-write ratio is high

– Also possible to log the changes and transfer only those logs

• Propagate the update operation to other copies: also called active replication. When the parameters are small, this saves bandwidth. However, more processing power may be required by each replica.

8

Pull versus Push Protocols

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Issue Push-based Pull-based

State of server List of client replicas and caches None

Messages sent Update (and possibly fetch update later) Poll and update

Response time at client

Immediate (or fetch-update time) Fetch-update time

maintain high degree of consistency, useful when read-to-write ratio is high

9

Unicasting versus Multicasting

• In unicasting, when a server sends its updates to N other servers, it does so by sending N separate messages. With multcasting, the underlying network takes care of sending a message efficiently to multiple receivers.

• Multicasting can be efficiently combined with a push-based approach to propagate updates. With a pull-based approach, unicasting may be more efficient.

10

6.4.3 Epidemic Protocols

• Epidemic algorithms do not solve any update conflicts. Instead, their only concern is propagating updates to all replicas in as few messages as possible.

• Assumes all updates for a specific data item are initiated at a single server, to avoid write-write conflict.

11

Epidemic Protocols

• A popular propagation model is that of anti-entropy: a server P picks another server Q at random, and subsequently exchange updates with Q in one of three approaches:– P only pushes its own updates to Q: a bad choice if many servers are

infective.

– P only pulls in new updates from Q: useful when many servers are infective.

– P and Q send updates to each other

• Rumor spreading (gossiping): If server P has just updated for data x, it contacts an arbitrary server Q and tries to push the update to Q. If Q has already updated, then with a probability 1/k, P may lose interest in spreading the update any further.– The fraction s of servers that will remain ignorant of the update

satisfies )1)(1( skes

12

Removing Data• Epidemic algorithms are good for spreading updates in

eventual-consistent data stores. However, spreading the deletion of a data item is hard.

• Trick: record the deletion as another update, and keep a record of that deletion. The recording of a deletion is done by spreading death certificates.

• Death certificates should be eventually cleaned up. One way is use timestamp. If it can be assumed that updates propagate to all servers within a known finite time, the death certificates can be removed after the maximum propagation time has elapsed.

• To provide hard guarantee, a very few servers maintain dormant death certificates that are never thrown away.

13

Consistency Protocols: Primary-based Remote-Write Protocols (1)

Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded. After finishing write, each backup server performs the update too. It may take a long time before the updating process is allowed to continue. (see next)

14

Primary-backup Protocol

If we want to change the write to non-blocking, then fault

tolerance will be a problem.

15

Local-Write Protocols (1)

Primary-based local-write protocol in which a single copy is migrated between processes. Need to keep track of where each data item currently is.

16

Local-Write Protocols (2)

Primary-backup protocol in which the primary migrates to the process wanting to perform an update. This could be applied to mobile computers operated in disconnected mode.

17

Replicated-Write ProtocolsActive Replication (1)

The problem of replicated invocations:1. Operations need to be carried out in the same order everywhere2. Replication invocations

18

Active Replication (2)

a) Forwarding an invocation request from a replicated object.b) Returning a reply to a replicated object.

19

Quorum-Based Protocols

Three examples of the voting algorithm:a) A correct choice of read and write setb) A choice that may lead to write-write conflictsc) A correct choice, known as ROWA (read one, write all)

Constraints on and : RN WN

2/.2

.1

NN

NNN

W

WR

20

Cache Coherence Protocols• Two criteria to classify caching protocols:

1. Coherence detection strategy: when inconsistencies are actually detected

• static: compiler analysis• dynamic: when during a transaction the detection is done

1. The transaction cannot proceed to use the cached version until its consistency has been validated

2. Let the transaction proceed while verification is taking place3. Verify only when the transaction committed

2. Coherence enforcement strategy: how caches are kept consistent

1. Disallow shared data to be cached2. Shared data can be cached:

1. Let the server send an invalidation to all caches whenever a data item is modified.

2. Simply propagate the update.

21

Cache Coherence Protocols

• What happens when a process modifies cached data– When read-only caches are used, update operations can be

performed only by the servers, which subsequently follow some distribution protocol to ensure that updates are propagated to caches.

– To allow clients to directly modify the cached data, and forward the update to the servers. This is followed in write-through caches.

– Write-back cache: delay the propagation of updates by allowing multiple writes to take place before informing the servers.

1 6.4 distribution protocols different ways of propagating/distributing updates to replicas,...

Documents

number of requests

transfer data

stale data

data stores

clientinitiated replicas

server specific files

permanent replicas

chosen server