bayou. references r the case for non-transparent replication: examples from bayou douglas b. terry,...

References

The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE Data Engineering, December 1998

Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer and Carl H. Hauser. In ACM Symposium on Operating Systems Principles (SOSP ’95)

Introduction

The scenario presented here is a little different then in previous work

Server replicas can be disconnected (temporarily) from the networks

There are conflicts in write updates that are best handled by the user Can’t view data as just bits

A read operation may result in stale data

Calendar Application

Calendar updates made by several people e.g., meeting room scheduling, or

exec+admin

Want to allow updates offline

But conflicts can’t be prevented

Two possibilities: Disallow offline updates Conflict resolution

Calendar Application Suppose we have two conference rooms of the

same capacity. I want to schedule my meeting in one of the conference rooms. I don’t care which exact room it is.

If two people reserve the same room at the same time, there is a conflict, but if they reserve the same room at different times or reserve different rooms at the same time, there is no conflict.

Rm2

Rm1

time

No conflict


Rm2

Rm1

time

No conflict

Rm2

Rm1

time

conflict


We can lock the entire database Not needed when there is no conflict In the case of conflicts, there is an

application specific way to deal with the conflict – we move one reservation to the other room

If the other room is reserved, we ask the user, if they can easily move the reservation to another acceptable time

Other resolution strategies:• Classes over meetings• Admin meetings bump faculty meetings

Shared mailbox

Shared mailbox folders- shared between, me and my 4 TAs.

We all replicate the mailbox. OP1: I see a mail from the class, respond to it and

delete it. OP2: The TA sees the same mail and files it in CS1026. OP3: I see an email from a friend and file it as

important.

mailbox

CS1026 Important Recruiting Chocolate

Shared Mailbox

All of us operate on the same mailbox You can lock the entire mailbox before

someone operates on it. Can’t work when disconnected Clearly not necessary for doing only one

operation For operation OP1 and OP2, it is not clear

who should win, should the mail be deleted or should it be filed in Assign1?

Two Approaches to Building Replicated Services

Transparent replication system: Allow systems that were developed assuming a

central file system or database to run unchanged on top of a strongly-consistent replicated storage system (as seen in Oceanstore)

Non-transparent replication system: Relaxed consistency model – access-update-

anywhere Applications involved in conflict detection and

resolution. Hence applications need to be modified (e.g. Bayou, Coda file system etc)

Hypothesis Conflicts between application users are

not easily handled transparently. Applications know best on how to

resolve conflicts The challenge is providing the right

interface to support cooperation between applications and their data managers

Conflict Detection :Dependency Checks

Each Write operation includes a dependency check consisting of an application-supplied query and its expected result.

If the check fails, then the requested update is not performed and the server invokes a procedure to resolve the detected conflict.

Example of Bayou Write3-tuple: <update><dependency check><mergeproc>

For example, Update: <insert, HL, 02/02/2009, 3:00pm, 1 hr, “Curriculum Meeting”>Dependency check: <Is there an entry in the database on 02/02/2009 at 3:00pm for 1 hr? expected answer is empty>Mergeproc: <Try 02/22/2009 at 4:15pm for 45min; if successful write that tuple>

A different merge procedure could search for the next available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.

Another is to allow the conflicts; sometimes users like conflicts

Conflict Resolution :Merge Procedure

A merge procedure is run to resolve a detected conflict

Merge procedures are written by application programs

Merge procedures are in the form of templates that are instantiated with the appropriate details

In the case where automatic resolution is not possible, the merge procedure will still run to completion, but is expected to produce a revised update that logs the detected conflict in some fashion that will enable a person to resolve the conflict later.

Replica Consistency

We may have multiple replicas e.g., Calendar on a PDA; intermittent connectivity Calendar on a server in the wired network

Need conflict resolution as described earlier but this in itself is not sufficient.

Need mechanism to deal with the variation at the multiple servers

Want users to be able to continue after a write even if not all replicas have carried out a write operation.

Replica Consistency Bayou is designed for eventual consistency:

All servers receive all Writes via the pair-wise anti-entropy process (described later)

Two servers holding the same set of writes will have the same data contents

Strict bounds on Write propagation delays not enforced.

Bayou has these features: Writes are performed in the same, well-defined order

at all servers The conflict detection and merge procedures are

deterministic so that servers resolve the same conflicts in the same manner

Replica Consistency When a Write is accepted by a Bayou server

from a client, it is deemed tentative Timestamps for tentative Writes must

monotonically increase at each server At some point each tentative write should be

marked as committed Must be able to undo writes; Why?

Servers may receive Writes from clients and from other servers in an order that differs from the required execution order;

Servers immediately apply all known Writes to their replicas.

Anti-Entropy Anti-entropy refers to a propagation

model of information between servers A server P picks another server Q at

random to exchange updates Three approaches:

P only pushes its own updates to Q P only pulls in new updates from Q P and Q send updates to each other

This model is used in Bayou for propagating writes (updates) All replicas receive all updates

• chain of pair-wise interactions

Example

Suppose a user keeps the primary copy of his calendar with him on his laptop and allows others, such as a spouse or secretary, to keep secondary (mostly read copies).

The user updates his own calendar; This is committed immediately.

Updates by the spouse/secretary are tentative until anti-entropy takes place with the user. At this point, the user can commit and propagate the order to the spouse/secretary during anti-entropy.

Ordering of Updates Maintain ordered list of writes at each

node This will be referred to as the write log Each write has a unique Write ID: <local-time-

stamp, originating-node-ID> The local-time-stamp is assigned by the server

that accepted the write from front end (referred to as the accepting server)

Write Log Example

<701, A>: Node A asks for meeting M1 to occur at 10 AM, else 11 AM in MC 316

<770, B>: Node B asks for meeting M2 to occur at 10 AM, else 11 AM in MC 316

Let’s agree to sort by write ID (e.g., <701, A> As writes operations spread from node to node,

nodes may initially apply updates in different orders

The dependency check and merge procedures do not necessarily take care of this

Want writes for a particular data to be performed in the same order at all replicas

Write Log Example

Each newly seen write merged into write log

Log replayed May cause calendar displayed to user to

change! i.e., all entries tentative, nothing stable

unless an entry is committed. How do we get commits?

Criteria for Committing Writes

For log entry X to be committed, everyone must agree on: Total order of all previous committed entries Fact that X is next in total order Fact that all uncommitted entries are “after”

X

How Bayou Agrees on Total Order

of Committed Writes One node designated primary replica Primary marks each write it receives with

permanent CSN (commit sequence number) That write is committed Complete timestamp is <CSN, local-TS, node-id>

Nodes exchange CSNs

How Bayou Agrees on Total Order

of Committed Writes CSNs define total order for committed writes

All nodes eventually agree on total order Uncommitted writes (these do not have CSNs) come

after all committed writes

Are there constraints on what the total order may look like? Yes. Assume that a user has issued two writes for a data

collection The order they were issued in should be reflected in the

total order

Showing Users that WritesHave Committed

Still not safe to show users that an appointment request has committed

Entire log up to newly committed entry must be committed else there might be earlier committed write a

node doesn’t know about! …and upon learning about it, would have to re-

run conflict resolution Result: committed write not stable unless

node has seen all prior committed writes

Commits

A server,R, should keep track for all other servers the following: R.V[X] is the latest timestamp from server X

that server R has seen When two servers connect, exchanging

the version vectors allows them to identify the missing updates

Committed vs. Tentative Writes

Can now show user if a write has committed When node has seen every CSN up to that

point, as guaranteed by propagation protocol Slow or disconnected node cannot prevent

commits! Primary replica allocates CSNs; global order of

writes may not reflect real-time write times What about tentative writes, though—how

do they behave, as seen by users?

Trimming the Log

When nodes receive new CSNs, can discard all committed log entries seen up to that point Update protocol guarantees CSNs received in

order Instead, keep copy of whole database as

of highest CSN By definition, official committed database Everyone does (or will) agree on contents Entries never need go through conflict

resolution

Technology Impact

TrueSync - end-to-end synchronization software and infrastructure solutions for the wireless Internet http://www.starfish.com/

SyncML - SyncML is the common language for synchronizing all devices and applications over any network. Ericsson, IBM, Lotus, Motorola, Nokia, Palm Inc.,

Psion, Starfish Software etc. (614 companies) http://www.syncml.org/

http://www.starfish.com/



http://www.syncml.org/



Conclusions

Difference from other replicated systems Non-transparency Application-specific conflict detection Per-write conflict resolvers Partial and multi-object updates Tentative and stable resolutions Security

Future goal Partial replication, policies for choosing servers for

anti-entropy, building servers with conventional database managers, alternate data models, and finer grain access control.

bayou. references r the case for non-transparent replication: examples from bayou douglas b. terry,...

Documents