distributed transaction
DESCRIPTION
DISTRIBUTED TRANSACTION. FASILKOM UNIVERSITAS INDONESIA. What is a Transaction?. An atomic unit of database access, which is either completely executed or not executed at all. - PowerPoint PPT PresentationTRANSCRIPT
1
DISTRIBUTEDDISTRIBUTEDTRANSACTIONTRANSACTION
FASILKOM
UNIVERSITAS INDONESIA
2
What is a Transaction?
An atomic unit of database access, which is either completely executed or not executed at all.
It consists of an application specified sequence of operation, beginning with a begin_transaction primitive and ending with either commit or abort.
3
E.g.
Transfer $200 from account A in London to account B in Depok:begin_transaction
amntA = lookup amount in account A
amntB = lookup amount in account Bif (amntA < $200) abort
set account A = amntA - $200
set account B = amntB + $200 commit
4
Transaction Properties
Four main properties, the ACID properties:– Atomicity: A transaction must be all or nothing.– Consistency: A transaction takes the system form one
consistent state to another consistent state.– Isolation: The results of an incomplete transactions
are not allowed to be revealed to other transactions.– Durability: The results of a committed transaction will
never be lost, independent of subsequent failures.
Atomicity & durability -> failure tolerance
5
Failure Tolerance
Atomicity & durability -> failure tolerance Types of failures :
• Transaction-local failures detected by the application (e.g.insufficient funds)
• Transaction-local failures not detected by the application (e.g. divide by zero)
• System failures affecting volatile storage (e.g. CPU failure)• Media failures (e.g. HD crash)
What is a volatile storage? What is a stable storage?
6
Recovery
Based on redundancy. For example :
1.Periodically archive database2.Every time a change is made, record old and new values
to a log.
3.If a failure occurs :• If not damage to physical database undo all ‘unreliable’ changes.• If database physically damaged, restore from archive and redo
changes
7
Logging (1)
Database vs transaction log. For each change (begin transaction,
commit, and abort), write a log record with:
• Transaction ID (TID)• Record ID• Type of action• Old value of record• New value of record• Other info, e.g. pointer to previous log record of this
transaction.
8
Logging (2)
After a failure we need to undo or redo changes.
Undo and redo must be idempotent as there may be a failure whilst they are executing.
9
Log Write-ahead Protocol (1)
Before performing any update, at least the undo portion of the log record must be written to stable storage.
Before committing a transaction, all log records must have been fully recorded on stable storage. The commit record is written after these.
10
Log Write-ahead Protocol (2)
Reason for first rule :– If we change log before database :
• log -- change -- crash • log -- crash
– If we change log after database :• change -- log -- crash • change -- crash can’t undo
11
Checkpointing (1)
How does the recovery manager know which transaction to undo an which to redo after a failure.
Naive approach :– Examine entire log from the start. Look for
begin transaction records: • if a corresponding commit record exists, redo; • if there’s an abort, do nothing; and • if neither, undo.
12
Checkpointing (2)
Alternative:– Every so often:
1) Force all log buffers to disk.2) Write a checkpoint record to disk containing:
a) A list of all active transactionsb) The most recent log records for each transaction in a)
3) Force all database buffers to disk - disk is now totally up-to-date.
4) Write address of checkpoint record to fixed ‘restart location’ (had better be atomic).
13
Checkpointing (3)
There are 5 categories of transaction:
Time
T1
T2
T3
T4
T5
CrashCheckpointing
Leave
Redo
Undo
Undo
Redo
14
Recovery (1)
Look for most recent checkpoint record. For all records active at checkpoint must:
– undo all active at failure– redo all others
15
Recovery (2)
Have 2 lists: undo and redo Initially, undo contains all TIDs in
checkpoint record & redo is empty3 passes through log:
– Forwards from checkpoint to end:• If we find ‘begin_transaction’ add undo list.• If we find ‘commit’, transfer from undo to redo list.• If we find ‘abort’, remove from undo list.
– Backwards from end to checkpoint: undo.– Forwards from checkpoint to end: redo.
16
Commit Protocols
Commit protocols. Assume a set of cooperating managers
which deal with parts of a transaction. For atomicity we must ensure that
– At each site, either all actions or none are performed.– All sites take the same decision on whether to commit
or abort
17
Two Phase Commit (2PC) Protocol - 1
One node, the coordinator, has a special role, the others are participants.
The coordinator initiates the 2PC protocol. If any participant cannot commit, then all
site must abort.
18
2PC – 2
Phase I:– reach a common decision on whether to abort
or commit
Phase II:– Implement the decision at all sites
19
2PC - 3
I
U
CA
Coordinator
tm/ACM
-/PM
AAM/ACM
RM/CCM
I
R
AC
Participant
ua/ -
CCM/ - ACM/ -
PM/AAM
PM/RM
2PC
States:I = Initial stateU = UndecidedR = Ready to CommitA = AbortC = Commit
Messages:PM = Prepare MessageRM = Ready MessageAAM = Abort Answer MessageACM = Abort Command MessageCCM = Commit Command Message
Other Transitions:ua = Unilateral Aborttm = timeout
20
2PC – Phase 1 Coordinator:
– Write prepare record to log– Multicast prepare message and set timeout
Participant:– Wait for prepare message– If we are willing to commit then
• force log records to stable storage• write ready record in log• send ready message to coordinator
– else• write ABORT in log• send abort answer message to coordinator
21
2PC – Phase 2 (1)
Coordinator:– wait for a reply messages (ready or abort) or timeout– If timeout expires or any message is abort
• write global abort record in the log
• send abort command message to all participants
– else• if all answers were ready
• write global commit record to log
• send commit command message to all participants
22
2PC – Phase 2 (2)
Participants:– Wait for command message (abort or commit)– write abort or commit in the log– send ack message to coordinator– execute command (may be null)
Coordinator:– wait for ack messages from all participants– write complete in the log
23
2PC – Site Failures
Resilient to all failures in which no log information is lost.
Site failures– participants fails before having written ready to log:
• timeout expires ---> ABORT
– Participants fails after having written ready to log:• Msg sent -- others take decision. This node gets outcome
from the coordinator or other participants after restart
• Msg unsent -- timeout expires ---> ABORT
24
2PC – Coordinator Failures
Coordinator fails after writing prepare but before global commit/global abort (globalX).– All participants must wait for recovery of coordinator ->
BLOCKING– Recovery of coordinator involves restarting protocol from identities
in prepare log record– Participants must identify duplicate prepare messages
Coordinator fails after having written global X but before writing complete.– On restart, coordinator must resend decision, to ensure blocked
processes get it. Others must discard duplicate. Coordinator fails after having written complete.
– No action needed
25
2PC – Lost Messages A reply message (ready or abort) from a
participant is lost.– Timeout expires -- coordinator ABORTs
A prepare message is lost.– Timeout expires -- coordinator ABORTs
A commit/abort command message is lost.– Timeout in participant -- request repetition of command
from the coordinator. An ack message is lost
– Timeout in coordinator -- coordinator resends command
26
2PC - Partitions
Everything aborts as coordinator can’t contact all participants. Those participants in partition without coordinator may remain blocked & the resources are still retained until the blocked participants are unblocked.
27
2PC - Comments
Blocking is a problem if the coordinator or network fails which reduces availability -> use 3PC.
Unilateral abort.– Any node can abort until it sends ready (site autonomy
before the ready state). Efficiency can be increased:
– Elimination of prepare messages. The participants, that can commit, will automatically send RM.
– Presumed commit/abort , if there’s no information found in the log. See [CER84] 13.5.1,2,&3.
28
Impossible Termination in 2PC
No operational participant has received the command. The operational participants are in the R state, but they haven’t received the ACM or CCM, AND
At least one participant failed. Unfortunately the failed participant acted as the coordinator.
29
Impossible Termination in 2PC
The failed participant might have already performed an undone action (commit or abort), i.e. in the C or A state.
The operational participants can’t know what the failed participant had done, and can’t take an independent decision.
The problem is solved by the 3PC.
30
3PC (1)3PC
I
U
BCA
Coordinator
tm/ACM
-/PM
AAM/ACMRM/PCM
I
R
APC
Participant
ua/ -
PCM/OK ACM/ -
PM/AAM
PM/RM
tm/ACM
C
OK/CCM
C
CCM/ -
New States:PC = Prepared to CommitBC = Before Commit
New Messages:PCM = Prepare to CommitOK = Entered PC statepossible restart
transitions
Restart 1
Restart 2
31
3PC (2)
Case study:– See slide no 3.
– London: Coordinator & Participant1
– Depok: Participant2
32
3PC (3)
3PC avoids problems with 2PC:1. If any operational participant has received an abort
then all can abort. The failed participant will abort at restart if it hasn’t already. [As 2PC] E.g. Depok fails, London is operational and has received an ACM.
2. If any participants has received the PCM, then all can commit. The failed participant (e.g.cannot have aborted unilaterally, because it had answered READY (RM). The failed participant will commit at restart (see “restart 1”). E.g. London fails, Depok is operational and has received the PCM.
33
3PC (4)
3. If none of the operational participants has received the PCM participant, i.e. all of the operational participants are in the R state, then 2PC would block. With 3PC we can abort safely since the failed participant cannot have committed. At most it has received the PCM -> it can abort at restart (see “restart 2”). E.g. London fails, Depok is operational and has NOT received the PCM (in the R state).
34
3PC (5)
3PC guarantees that there won’t be blocking condition caused by all possible failures during the 2nd phase.
Failures during the 3rd phase -> blocking???– If coordinator fails in 3rd phase, then elect
another and continue the commit process (since all must be in the PC state).
35
Consistency & Isolation Consistency & isolation -> concurrency control. The Lost Update Problem:
Transaction 1
Read X
Update X
Transaction 2
Read X
Update X
Lost update
time
36
The Uncommitted Dependency (Temporary Update) Problem
Transaction 1
Read X
Transaction 2
Update X
ABORT
temporary incorrect value of X,because Trasaction2 is aborted.
time
37
The Inconsistent Analysis Problem
Transaction 1
sum := 0Read Asum := sum + A
Transaction 2
Read Bsum := sum + B
Read A
Read B
Update A
Update B
COMMIT
time
before the update by transaction2
after the update by transaction2
38
Concurrent Transactions
If we have concurrent transactions, we must prevent interference.
c.f. lost update problem– Prevent T2’s read (because T1 has seen it and may update it)
[Locking]– Prevent T1’s update (because T2 has seen it) [Locking]– Prevent T2’s update (because T1 has already updated it and so
this is based on obsolete values) [timestamping]– Have them work independently and resolve difficulties on
commit.[Optimistic concurrency control]
39
Serializability
What we need is some notion of correctness.
Serializability is usually used write to transactions.
40
Serial Transactions
Two transactions execute serially if all operations of one precede all operations of the other. e.g:
S1: Ri(x) Wi(x) Ri(y) Rj(x) Wj(y) Rk(y) Wk(x), or
S1: TiTjTk, S2: TkTjTi, ………..
S1 = Schedule 1, S2 = Schedule 2 All serial schedules are correct, but restrictive of
concurrency .
41
Transaction Conflict
Two operations are in conflict if:– At least one is a write– They both act on the same data– They are issued by different transactions
Which of the following are in conflict?
Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)
42
Computationally Equivalent
Two schedules (S1 & S2) are computationally equivalent if:– The same operations are involved (possibly
reordered)
– For every pair of operations in conflict (Oi & Oj),such that Oi precedes Oj in S1, then also Oi
precedes Oj in S2.
43
Serializable Schedule
A schedule is serializable if it is computationally equivalent to a serial schedule. e.g:
Ri(x) Rj(x) Wj(y) Wi(x)(which is not a serial schedule)is computationally equivalent to:
Rj(x) Wj(y) Ri(x) Wi(x)(which is a serial schedule: TjTi)
The following is NOT a serial schedule. But is it serialisable? Ri(x) Rj(x) Wi(y) Rk(y) Wj(x)The above schedule is computationally equivalent to serial schedules: TiTjTk, TiTkTj.
44
Serializability in Distributed Systems (1)
A local concurrency control mechanism isn’t sufficient. e.g:– Site 1: Ri(x) Wi(x) Rj(y) Wj(x) i.e: Ti < Tj
– Site 2: Rj(y) Wj(y) Ri(y) Wi(y) i.e: Tj < Ti
45
Serializability in Distributed Systems (2)
Let T1…Tn be a set of transactions and E be an execution of these modeled by schedules S1…Sm on machines 1…m.
Each local schedule (S1…Sm) is serialisable. Then E is serialisable (in distributed systems) if,
for all i and j, all conflicting operations from Ti and Tj in each of the schedules have the same order i.e. there is a global total ordering for all sites.
46
Locking (1)
How to implement serializability use locking
Shared/eXclusive (Read/Write) locks:1. A transaction T must have SLockx or
XLockx before any Read X.2. A transaction T must have XLockx before
any Write X.3. A transaction T must issue unLockx after
Read x or Write x is completed.
47
Locking (2)
4. A transaction T can upgrade the lock, i.e. issuing a XLockx after having SLockx, as long as T is the only transaction having Slockx. Otherwise T must wait.
5. A transaction T can downgrade the lock, i.e. issuing a SLockx after having XLockx.
48
Locking (3)
E.g.T1: X = X + Y T2: Y = X + Y
If initially X=20, Y=30 then either:– S1: T1 < T2: X=50, Y=80
– S2: T2 < T1: X=70, Y=50
Both are serial schedules, thus both are correct.
49
Locking (4)
However using Shared/eXclusive (Read/Write) locks does NOT guarantee serializability.
If any transaction releases a lock and then acquires another, it may produce incorrect results.
50
Locking (5)T1 T2
SLock xtemp2 = x 20
XLock y
y = temp2 + temp3 50
COMMIT
SLock y
temp1=y 30
unLock y
XLock x
x = temp4 + temp1 50
COMMIT
unLock xunLock x
temp3 = y
unLock y
temp4 = x 20
unLock x The schedule is NOT serializable!!!So it is NOT correct
time
51
Locking (6)
What is the problem?– It was too early unlocking Y in T1 and
unlocking X in T2. See the italics unLock Y and unLock X.
What is the solution?– 2 Phase Locking (2PL).
52
2PL - 1
Two phase locking (2PL)– Before operating on any object the transaction must
obtain a lock for it.– After releasing a lock the transaction never acquires
more locks– 2 phases:
1. Expanding (growing) phase: acquiring new locks, but NEVER releasing any locks.
2. Shrinking phase: releasing existing locks, but NEVER acquiring new locks.
53
2PL - 2
Exercise: modify the schedule on slide 50 by following the 2 PL.
2PL may cause deadlocks. See [ELM00]. If a schedule obeys 2PL it is serializable. How is the vice versa? Do all serializable
schedules follow the 2 PL?
54
2PL - 3Serializable but not 2PL
Ri (x) Temp1 = xWi (x)
Rj (x)
Wj (x) Ri (y)
Wi (y)
Rj (y)
Wj (y)
Equivalent 2PL
Ri (x)
Wi (x)
Rj (x)
Wj (x)
Ri (y)
Wi (y)
Rj (y)
Wj (y)
Account x at site1 & account y at site2.Ti : Ri(x) Wi(x) Ri(y) Wi(y)Tj : Rj(x) Wj(x) Rj(y) Wj(y)
Site1 Site2Site1 Site2
New problem: 2 PL may limit the amount of
concurrency. See the schedule on the right.
time
55
Optimistic Concurrency Control
Locking is pessimistic. Assume instead that contention is rare– All updates made to a private copy– On commit see if there are conflicts with other
transactions started afterwards.– If not, install changes atomically– else ABORT
Deadlock free & maximum parallelism, but may get livelock.– What is livelock?
56
Timestamping (1)
Again, no deadlock Rules:
– Each transaction receives a globally unique timestamp, TSi when started.
– Updates are not physically installed until commit.
– Every objects in the database carries the timestamp of the last transaction to read it (RTM(x)) and the last to write it (WTM(x))
57
Timestamping (2)
– If a transaction, Ti, requests an operation that conflicts with a younger transaction Tj, then Ti is restarted with a new timestamp.
– An operation from Ti is in conflict with an operation from Tj if.:
- It is a read and the object has already been
update by Tj; i.e. TSi < WTM(x), read operation is rejected & Ti is started with new time stamp. If the read is OK, set RTM(x) = max(TSi,RTM(x))
- It is update and the object has already been read or update by Tj; i.e. TSi < RTM(x) or
TSi < WTM(x), update operation is rejected & Ti is started with new time stamp. If the update is OK, set WTM(x) = TSi.
58
References
[CER84] Ceri, S., G. Pelagatti. Distributed Databases: Principles and Systems. New York: McGraw-Hill, 1984
[ELM00] Elmasri R,. S.B. Navathe. Fundamentals of Database Systems 3rd ed. Reading: Addison-Wesley, 2000