bayou. references r the case for non-transparent replication: examples from bayou douglas b. terry,...
TRANSCRIPT
Bayou
References
The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE Data Engineering, December 1998
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer and Carl H. Hauser. In ACM Symposium on Operating Systems Principles (SOSP ’95)
Introduction
The scenario presented here is a little different then in previous work
Server replicas can be disconnected (temporarily) from the networks
There are conflicts in write updates that are best handled by the user Can’t view data as just bits
A read operation may result in stale data
Calendar Application
Calendar updates made by several people e.g., meeting room scheduling, or
exec+admin
Want to allow updates offline
But conflicts can’t be prevented
Two possibilities: Disallow offline updates Conflict resolution
Calendar Application Suppose we have two conference rooms of the
same capacity. I want to schedule my meeting in one of the conference rooms. I don’t care which exact room it is.
If two people reserve the same room at the same time, there is a conflict, but if they reserve the same room at different times or reserve different rooms at the same time, there is no conflict.
Rm2
Rm1
time
No conflict
Calendar Application
Rm2
Rm1
time
No conflict
Rm2
Rm1
time
conflict
Calendar Application
We can lock the entire database Not needed when there is no conflict In the case of conflicts, there is an
application specific way to deal with the conflict – we move one reservation to the other room
If the other room is reserved, we ask the user, if they can easily move the reservation to another acceptable time
Other resolution strategies:• Classes over meetings• Admin meetings bump faculty meetings
Shared mailbox
Shared mailbox folders- shared between, me and my 4 TAs.
We all replicate the mailbox. OP1: I see a mail from the class, respond to it and
delete it. OP2: The TA sees the same mail and files it in CS1026. OP3: I see an email from a friend and file it as
important.
mailbox
CS1026 Important Recruiting Chocolate
Shared Mailbox
All of us operate on the same mailbox You can lock the entire mailbox before
someone operates on it. Can’t work when disconnected Clearly not necessary for doing only one
operation For operation OP1 and OP2, it is not clear
who should win, should the mail be deleted or should it be filed in Assign1?
Two Approaches to Building Replicated Services
Transparent replication system: Allow systems that were developed assuming a
central file system or database to run unchanged on top of a strongly-consistent replicated storage system (as seen in Oceanstore)
Non-transparent replication system: Relaxed consistency model – access-update-
anywhere Applications involved in conflict detection and
resolution. Hence applications need to be modified (e.g. Bayou, Coda file system etc)
Hypothesis Conflicts between application users are
not easily handled transparently. Applications know best on how to
resolve conflicts The challenge is providing the right
interface to support cooperation between applications and their data managers
Conflict Detection :Dependency Checks
Each Write operation includes a dependency check consisting of an application-supplied query and its expected result.
If the check fails, then the requested update is not performed and the server invokes a procedure to resolve the detected conflict.
Example of Bayou Write3-tuple: <update><dependency check><mergeproc>
For example, Update: <insert, HL, 02/02/2009, 3:00pm, 1 hr, “Curriculum Meeting”>Dependency check: <Is there an entry in the database on 02/02/2009 at 3:00pm for 1 hr? expected answer is empty>Mergeproc: <Try 02/22/2009 at 4:15pm for 45min; if successful write that tuple>
A different merge procedure could search for the next available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.
Another is to allow the conflicts; sometimes users like conflicts
Conflict Resolution :Merge Procedure
A merge procedure is run to resolve a detected conflict
Merge procedures are written by application programs
Merge procedures are in the form of templates that are instantiated with the appropriate details
In the case where automatic resolution is not possible, the merge procedure will still run to completion, but is expected to produce a revised update that logs the detected conflict in some fashion that will enable a person to resolve the conflict later.
Replica Consistency
We may have multiple replicas e.g., Calendar on a PDA; intermittent connectivity Calendar on a server in the wired network
Need conflict resolution as described earlier but this in itself is not sufficient.
Need mechanism to deal with the variation at the multiple servers
Want users to be able to continue after a write even if not all replicas have carried out a write operation.
Replica Consistency Bayou is designed for eventual consistency:
All servers receive all Writes via the pair-wise anti-entropy process (described later)
Two servers holding the same set of writes will have the same data contents
Strict bounds on Write propagation delays not enforced.
Bayou has these features: Writes are performed in the same, well-defined order
at all servers The conflict detection and merge procedures are
deterministic so that servers resolve the same conflicts in the same manner
Replica Consistency When a Write is accepted by a Bayou server
from a client, it is deemed tentative Timestamps for tentative Writes must
monotonically increase at each server At some point each tentative write should be
marked as committed Must be able to undo writes; Why?
Servers may receive Writes from clients and from other servers in an order that differs from the required execution order;
Servers immediately apply all known Writes to their replicas.
Anti-Entropy Anti-entropy refers to a propagation
model of information between servers A server P picks another server Q at
random to exchange updates Three approaches:
P only pushes its own updates to Q P only pulls in new updates from Q P and Q send updates to each other
This model is used in Bayou for propagating writes (updates) All replicas receive all updates
• chain of pair-wise interactions
Example
Suppose a user keeps the primary copy of his calendar with him on his laptop and allows others, such as a spouse or secretary, to keep secondary (mostly read copies).
The user updates his own calendar; This is committed immediately.
Updates by the spouse/secretary are tentative until anti-entropy takes place with the user. At this point, the user can commit and propagate the order to the spouse/secretary during anti-entropy.
Ordering of Updates Maintain ordered list of writes at each
node This will be referred to as the write log Each write has a unique Write ID: <local-time-
stamp, originating-node-ID> The local-time-stamp is assigned by the server
that accepted the write from front end (referred to as the accepting server)
Write Log Example
<701, A>: Node A asks for meeting M1 to occur at 10 AM, else 11 AM in MC 316
<770, B>: Node B asks for meeting M2 to occur at 10 AM, else 11 AM in MC 316
Let’s agree to sort by write ID (e.g., <701, A> As writes operations spread from node to node,
nodes may initially apply updates in different orders
The dependency check and merge procedures do not necessarily take care of this
Want writes for a particular data to be performed in the same order at all replicas
Write Log Example
Each newly seen write merged into write log
Log replayed May cause calendar displayed to user to
change! i.e., all entries tentative, nothing stable
unless an entry is committed. How do we get commits?
Criteria for Committing Writes
For log entry X to be committed, everyone must agree on: Total order of all previous committed entries Fact that X is next in total order Fact that all uncommitted entries are “after”
X
How Bayou Agrees on Total Order
of Committed Writes One node designated primary replica Primary marks each write it receives with
permanent CSN (commit sequence number) That write is committed Complete timestamp is <CSN, local-TS, node-id>
Nodes exchange CSNs
How Bayou Agrees on Total Order
of Committed Writes CSNs define total order for committed writes
All nodes eventually agree on total order Uncommitted writes (these do not have CSNs) come
after all committed writes
Are there constraints on what the total order may look like? Yes. Assume that a user has issued two writes for a data
collection The order they were issued in should be reflected in the
total order
Showing Users that WritesHave Committed
Still not safe to show users that an appointment request has committed
Entire log up to newly committed entry must be committed else there might be earlier committed write a
node doesn’t know about! …and upon learning about it, would have to re-
run conflict resolution Result: committed write not stable unless
node has seen all prior committed writes
Commits
A server,R, should keep track for all other servers the following: R.V[X] is the latest timestamp from server X
that server R has seen When two servers connect, exchanging
the version vectors allows them to identify the missing updates
Committed vs. Tentative Writes
Can now show user if a write has committed When node has seen every CSN up to that
point, as guaranteed by propagation protocol Slow or disconnected node cannot prevent
commits! Primary replica allocates CSNs; global order of
writes may not reflect real-time write times What about tentative writes, though—how
do they behave, as seen by users?
Trimming the Log
When nodes receive new CSNs, can discard all committed log entries seen up to that point Update protocol guarantees CSNs received in
order Instead, keep copy of whole database as
of highest CSN By definition, official committed database Everyone does (or will) agree on contents Entries never need go through conflict
resolution
Technology Impact
TrueSync - end-to-end synchronization software and infrastructure solutions for the wireless Internet http://www.starfish.com/
SyncML - SyncML is the common language for synchronizing all devices and applications over any network. Ericsson, IBM, Lotus, Motorola, Nokia, Palm Inc.,
Psion, Starfish Software etc. (614 companies) http://www.syncml.org/
Conclusions
Difference from other replicated systems Non-transparency Application-specific conflict detection Per-write conflict resolvers Partial and multi-object updates Tentative and stable resolutions Security
Future goal Partial replication, policies for choosing servers for
anti-entropy, building servers with conventional database managers, alternate data models, and finer grain access control.