database replication in wan yi lin supervised by: prof. kemme april 8, 2005
Post on 17-Jan-2016
216 Views
Preview:
TRANSCRIPT
Database Replication in WAN
Yi Lin
Supervised by: Prof. Kemme
April 8, 2005
Contents
1. Introduction2. Centralized Snapshot Isolation
Replication (SIR) protocol3. Decentralized SIR protocol for WAN4. Experiments5. Further optimizations6. Related work7. Conclusions and milestones
1. Introduction: What,Why,How?
… …
Montreal MontrealToronto Ottawa
TorontoToronto MontrealMontreal OttawaOttawa
Without Replication With Replication
Benefits: Performance, Fault Tolerance
Replica controlWAN
x
1. Introduction, challenge
x
w(x)
x
w(x)
x
General Correctness Criteria: 1-copy-serializability
1. Introduction, 1-copy-serializability
• 1-copy-serializability– The replicated system behaves as one database providing
serializability
• Serializability– Highest txn isolation level
• to what extend txns interfere with each other
– The result is the same as executing them serially.– Conflict: read/write and write/write
time
T0
T1
T2
w(x)
w(x)r(x)
r(x) w(x)
w(y)
r(z), w(z)T3
w(x)
w(x)r(x) w(y)
r(x),w(x)
r(z)w(z)
time
1. Introduction, 1-copy-SI
• Snapshot Isolation (SI): – Conflict: only write/write – Read from a snapshot of the committed data as of the time
txn starts. – 2 concurrent write txns. If one commits, the other aborts– Very popular (Oracle, PostgreSQL)
• 1-copy-SI– The replicated system behaves as one database providing SI
time
T0
T1
T2
w(x)
w(x)r(x)
r(x) w(x)
w(y)
commit
abort
• Challenge:– How to detect concurrent conflicting txns? Validation
r(x)
2. Centralized Snapshot Isolation Replication (SIR) Protocol
x x x
commitw(x)
validation
w(x)
apply ws, commitExtract writeset
x
validation
commit
succeed fail
• How to detect two txns are conflicting?– Writeset contains modified tuples and their
corresponding primary keys.– If two writesets share some primary keys, they
conflict. – Note: Snapshot Isolation only cares about write/write
conflicts.
Key=1T1 T2
2. Centralized SIR Protocol
• How to detect two txns are concurrent?
T1
T2
start=1
• A counter for each database, increased upon committing a txn • Record start time and end time of txns• T0.end T1.start || T1.end T0.start T0 and T1 not
concurrent.
2. Centralized SIR Protocol
T0start=0 end=1
end=2
counter
start=1
• Centralized approach not good for WANs
3. Decentralized SIR Protocol for WANs
Middlewarereplica
DB DB
WAN
WAN
Centralized Architecture
Middlewarereplica
Middlewarereplica
DB DB
WAN
Decentralized Architecture
LAN
LAN
Group Comm,Total order
3. Decentralized SIR Protocol for WANs
x x
r(x) r(x)commit commitw(x) w(x)
Extract writeset Extract writeset
T1 T2
T1 T1 T1T2 T2 T2 validationvalidationvalidation
validation validation validation
Challenge:1. Validation same as centralized approach2. Total orderall middleware components make the same decision
failfailfailsucceed succeed succeed
apply ws, commit
x x
abort
4. Experiments
0
50
100
150
200
250
300
350
25 50 75 100 125Load (txn/sec)
Res
po
nse
Tim
e (m
s)Read-onlyUpdateRead-only (centr.)Update (centr.)
Fig. TPC-W benchmark, 5 sites, 50% update txns,
Group Comm,Total order
5. Some optimizations
• With GCS– Disadvantage:
• Total order expensive• Large response time
– Advantage:• Uniform reliable for failover
r(x)w(x)commit
T1 T2
•Without GCS, but with a sequencer–Advantage:
•Less communication overhead–Disadvantage:
•Complicated in Failover
sequencer
r(x)w(x)commit
Extract writeset Extract writeset
validationvalidationsucceed
failcommit commitabort
• Kernel-based replica control
• Middleware-based replica control– Advantages
• Heterogeneous DB• Easy to implement
– Disadvantages• No access to concurrency control
in the kernel
6. Related work
oralce PostgreSQL
6. Related work
• Many have a centralized component. [Ganymed, Conflict Aware]
– Does not work well in WANs
• Some are primary/secondary approaches.[Ganymed]
– Updates must always be performed on primary copy
– Need to mark read-only txn in advance
• Some need to know all operations in advance [Conflict Aware]
• Some are table-based locking [Middle-R, Conflict Aware]
• Nearly all only look at 1-copy-serializability [Conflict Aware, GlobData, Middle-R, State Machine]
7. Conclusions
• Work well in WANs– Only 1 multicast msg
• No restrictions such as– Marking read-only txn in advance– Knowing all operations in advance
• Tuple based locking • 1-copy-SI
7. Milestones
• Currently– 1-copy-SI– Centralized and decentralized protocol
formulized, implemented
• Sep, 2005:– Failover (coordinated with a Master
project)
• Dec, 2005:– Further optimizations proposed in report
• May, 2006:– Recovery
GCSTotal order
References• [SIR] Y. Lin, B. Kemme, R. Jimenez-Peris, and M. Patiòno-Martnez. Middlew
are based data replication providing snapshot isolation. In SIGMOD, June 2005.
• [Ganymed] C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In Middleware, 2004.
• [GlobData] L. Rodrigues, H. Miranda, R. Almeida, J. Martins, and P. Vicente. Strong Replication in the GlobData Middleware. In Workshop on Dependable Middleware-Based Systems, 2002.
• [Middle-R] R. Jimenez-Peris, M. Patiòno-Martnez, B. Kemme, and G. Alonso. Improving Scalability of Fault Tolerant Database Clusters. In ICDCS'02.
• [Conflict-Aware] C. Amza, A. L. Cox, and W. Zwaenepoel. Conict-Aware Scheduling for Dynamic Content Applications. In USENIX Symp. on Internet Tech. and Sys., 2003.
• [Postgres-R] S. Wu and B. Kemme. Postges-R(SI): Combining replica control with concurrency control based on snapshot isolation. In ICDE, Tokoyo, Japan, 2005.
• [State Machine] F. Pedone, R. Guerraoui, and A. Schiper. The Database State Machine Approach. Distributed and Parallel Databases, 14:71-98, 2003.
top related