database replication in wan yi lin supervised by: prof. kemme april 8, 2005

Post on 17-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Database Replication in WAN

Yi Lin

Supervised by: Prof. Kemme

April 8, 2005

Contents

1. Introduction2. Centralized Snapshot Isolation

Replication (SIR) protocol3. Decentralized SIR protocol for WAN4. Experiments5. Further optimizations6. Related work7. Conclusions and milestones

1. Introduction: What,Why,How?

… …

Montreal MontrealToronto Ottawa

TorontoToronto MontrealMontreal OttawaOttawa

Without Replication With Replication

Benefits: Performance, Fault Tolerance

Replica controlWAN

x

1. Introduction, challenge

x

w(x)

x

w(x)

x

General Correctness Criteria: 1-copy-serializability

1. Introduction, 1-copy-serializability

• 1-copy-serializability– The replicated system behaves as one database providing

serializability

• Serializability– Highest txn isolation level

• to what extend txns interfere with each other

– The result is the same as executing them serially.– Conflict: read/write and write/write

time

T0

T1

T2

w(x)

w(x)r(x)

r(x) w(x)

w(y)

r(z), w(z)T3

w(x)

w(x)r(x) w(y)

r(x),w(x)

r(z)w(z)

time

1. Introduction, 1-copy-SI

• Snapshot Isolation (SI): – Conflict: only write/write – Read from a snapshot of the committed data as of the time

txn starts. – 2 concurrent write txns. If one commits, the other aborts– Very popular (Oracle, PostgreSQL)

• 1-copy-SI– The replicated system behaves as one database providing SI

time

T0

T1

T2

w(x)

w(x)r(x)

r(x) w(x)

w(y)

commit

abort

• Challenge:– How to detect concurrent conflicting txns? Validation

r(x)

2. Centralized Snapshot Isolation Replication (SIR) Protocol

x x x

commitw(x)

validation

w(x)

apply ws, commitExtract writeset

x

validation

commit

succeed fail

• How to detect two txns are conflicting?– Writeset contains modified tuples and their

corresponding primary keys.– If two writesets share some primary keys, they

conflict. – Note: Snapshot Isolation only cares about write/write

conflicts.

Key=1T1 T2

2. Centralized SIR Protocol

• How to detect two txns are concurrent?

T1

T2

start=1

• A counter for each database, increased upon committing a txn • Record start time and end time of txns• T0.end T1.start || T1.end T0.start T0 and T1 not

concurrent.

2. Centralized SIR Protocol

T0start=0 end=1

end=2

counter

start=1

• Centralized approach not good for WANs

3. Decentralized SIR Protocol for WANs

Middlewarereplica

DB DB

WAN

WAN

Centralized Architecture

Middlewarereplica

Middlewarereplica

DB DB

WAN

Decentralized Architecture

LAN

LAN

Group Comm,Total order

3. Decentralized SIR Protocol for WANs

x x

r(x) r(x)commit commitw(x) w(x)

Extract writeset Extract writeset

T1 T2

T1 T1 T1T2 T2 T2 validationvalidationvalidation

validation validation validation

Challenge:1. Validation same as centralized approach2. Total orderall middleware components make the same decision

failfailfailsucceed succeed succeed

apply ws, commit

x x

abort

4. Experiments

0

50

100

150

200

250

300

350

25 50 75 100 125Load (txn/sec)

Res

po

nse

Tim

e (m

s)Read-onlyUpdateRead-only (centr.)Update (centr.)

Fig. TPC-W benchmark, 5 sites, 50% update txns,

Group Comm,Total order

5. Some optimizations

• With GCS– Disadvantage:

• Total order expensive• Large response time

– Advantage:• Uniform reliable for failover

r(x)w(x)commit

T1 T2

•Without GCS, but with a sequencer–Advantage:

•Less communication overhead–Disadvantage:

•Complicated in Failover

sequencer

r(x)w(x)commit

Extract writeset Extract writeset

validationvalidationsucceed

failcommit commitabort

• Kernel-based replica control

• Middleware-based replica control– Advantages

• Heterogeneous DB• Easy to implement

– Disadvantages• No access to concurrency control

in the kernel

6. Related work

oralce PostgreSQL

6. Related work

• Many have a centralized component. [Ganymed, Conflict Aware]

– Does not work well in WANs

• Some are primary/secondary approaches.[Ganymed]

– Updates must always be performed on primary copy

– Need to mark read-only txn in advance

• Some need to know all operations in advance [Conflict Aware]

• Some are table-based locking [Middle-R, Conflict Aware]

• Nearly all only look at 1-copy-serializability [Conflict Aware, GlobData, Middle-R, State Machine]

7. Conclusions

• Work well in WANs– Only 1 multicast msg

• No restrictions such as– Marking read-only txn in advance– Knowing all operations in advance

• Tuple based locking • 1-copy-SI

7. Milestones

• Currently– 1-copy-SI– Centralized and decentralized protocol

formulized, implemented

• Sep, 2005:– Failover (coordinated with a Master

project)

• Dec, 2005:– Further optimizations proposed in report

• May, 2006:– Recovery

GCSTotal order

References• [SIR] Y. Lin, B. Kemme, R. Jimenez-Peris, and M. Patiòno-Martnez. Middlew

are based data replication providing snapshot isolation. In SIGMOD, June 2005.

• [Ganymed] C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In Middleware, 2004.

• [GlobData] L. Rodrigues, H. Miranda, R. Almeida, J. Martins, and P. Vicente. Strong Replication in the GlobData Middleware. In Workshop on Dependable Middleware-Based Systems, 2002.

• [Middle-R] R. Jimenez-Peris, M. Patiòno-Martnez, B. Kemme, and G. Alonso. Improving Scalability of Fault Tolerant Database Clusters. In ICDCS'02.

• [Conflict-Aware] C. Amza, A. L. Cox, and W. Zwaenepoel. Conict-Aware Scheduling for Dynamic Content Applications. In USENIX Symp. on Internet Tech. and Sys., 2003.

• [Postgres-R] S. Wu and B. Kemme. Postges-R(SI): Combining replica control with concurrency control based on snapshot isolation. In ICDE, Tokoyo, Japan, 2005.

• [State Machine] F. Pedone, R. Guerraoui, and A. Schiper. The Database State Machine Approach. Distributed and Parallel Databases, 14:71-98, 2003.

top related