chapter 16 : concurrency control
DESCRIPTION
Chapter 16 : Concurrency Control. Lock-Based Protocols Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations Concurrency in Index Structures. Chapter 16: Concurrency Control. Goal of Concurrency Control. - PowerPoint PPT PresentationTRANSCRIPT
Chapter 16 : Concurrency Control
Chapter 16: Concurrency Control Lock-Based Protocols
Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations Concurrency in Index Structures
Goal of Concurrency Control
Transactions should be executed so that it isas though they executed in some serial order
Also called Isolation or Serializability
Transactional Concurrency Control Three ways to ensure a serial-equivalent order on
conflicts: Option 1, execute transactions serially.
“single shot” transactions Option 2, pessimistic concurrency control: block T until
transactions with conflicting operations are done. Done using locks
Option 3, optimistic concurrency control: proceed as if no conflicts will occur, and recover if constraints are violated. Repair the damage by rolling back (aborting) one of the
conflicting transactions. Option 4, hybrid timestamp ordering using versions.
Locking Locking is the most frequent technique used to
control concurrent execution of database transactions Operating systems provide a binary locking system
that is too restrictive for database transactions That's why DBMS contains its own lock manager A lock_value(X ) is variable associated with (each)
database data item X The lock_value(X ) describes the status of the data
item X, by telling which operations can be applied to X
Concurrency Control using Locking The locking technique operates by preventing a transaction from
improperly accessing data that is being used by another transaction.
Before a transaction can perform a read or write operation, it must claim a read (shared) or write (exclusive) lock on the relevant data item.
Once a read lock has been granted on a particular data item, other transactions may read the data, but not update it.
A write lock prevents all access to the data by other transactions.
Kinds of Locks Generally, the lock manager of a DBMS offers two kinds of
locks: A shared (read) lock, and An exclusive (write) lock
If a transaction T issues a read_lock(X ) command, it will be added to the list of transactions that share lock on item X, unless there is a transaction already holding write lock on X
If a transaction T issues a write_lock(X ) command, it will be granted an exclusive lock on X, unless another transaction is already holding lock on X
Accordingly,lock_value{read_lock, write_lock, unlocked }
Lock Semantics Since read operations cannot conflict, it is acceptable
for more than one transaction to hold read locks simultaneously on the same item.
On the other hand, a write lock gives a transaction exclusive access to the data item
Locks are used in the following way: A transaction needing access to a data item must first lock the
item, requesting a read lock for read-only access or a write-lock for read-write access.
If the item is not already locked by another transaction, the lock request will be granted.
If the item is currently locked, the DBMS determines whether the request is compatible with the current lock. If a read lock is requested on an item that is already read locked, the request is granted, otherwise the transaction must wait until the existing write lock is released.
A transaction holds a lock until it explicitly releases it, commits, or aborts.
Basic Locking Rules The basic locking rules are:
T must issue a read_lock(X ) or write_ lock(X ) command before any read_item(X ) operation
T must issue a write_lock(X ) command before any write_item(X ) operation
T must issue an unlock(X ) command when all read_item(X ) or write_item(X ) operations are completed
Some DBMS lock managers perform automatic locking by granting an appropriate database item lock to a transaction when it attempts to read or write an item into database
So, an item lock request can be either explicit, or implicit
Locking Rules Lock manager- The part of DBMS that keeps track of locks
issued to transactions is called lock manager. It maintains lock table
Data items can be locked in two modes : 1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.
Lock-Based Protocols Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item. If a lock cannot be granted, the requesting transaction has to wait
till all incompatible locks held by other transactions have been released. The lock is then granted.
Lost Update Problem and LockingT1 T2
read_lock(X )read_item ( X )unlock(X )
write_lock(X )X = X – Nwrite_item (X )unlock(X )
write_lock(X )read_item XX = X + Mwrite_item(X )unlock(X )
time
T2's update to X is lostbecause T1 wrote over X,and it happened despite the fact that both transactions are issuing lock and unlock commandsThe problem is that T1 releases lock on X to early, allowing T2 to start updating XWe need a protocol that will guarantee serializability
A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules.
The basic Two Phase Locking Protocol
All lock operations must precede the first unlock operation Now, it can be considered as if a transaction has two phases:
Growing (or expanding), when all locks are being acquired, and Shrinking, when locks are being downgraded and released, but non
can be acquired or upgraded
A theorem:If all transactions in a schedule obey locking rules and two phase locking protocol, the schedule is a conflict serializable one
Consequently: A schedule that obeys to two phase locking protocol has not to be
tested for conflict serializability
Lost Update and Two Phase Locking
T1 T2
read_lock(X )read_item ( X )X = X – N
write_lock(X )write_item (X )unlock(X )
write_lock(X )//has to wait
write_lock(X )read_item (X)X = X + Mwrite_item(X )unlock(X )
time
T2 can not obtain a write_lock on X since T1 holds a read lock on Xand it has to wait
When T1 releases lock on X, T2 acquires a lock on X and finishes successfully
Two phase locking provides for a safe conflict serializable schedule
The Two-Phase Locking Protocol Many database systems employ a two-phase locking protocol to control the
manner in which locks are acquired and released. This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Once all locks have been acquired , transaction is in its locked point.
Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks
The rules of the protocol are as follows: A transaction must acquire a lock on an item before operating on the item. The lock may be read or write, depending on the type of access needed. Once the transaction releases a lock, it can never acquire any new locks.
The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock, end of growing phase).
The Two-Phase Locking Protocol Two-phase locking is governed by following rules
Two transaction can not have conflicting locks. No unlock operation can precede lock operation in the same transaction. The point in the schedule where the final lock is obtained is called the
lock point.
Phase 1 lock
Phase 2 unlock
Growingphase
Lock phase
Shrinkingphase
A Question for You
T1 T2
read_item ( X )X = X – Nwrite_item (X )
read_item ( Y )T1 fails
read_item (X)X = X + Mwrite_item (X)
time
This slide describes the dirtyread problem
The question:Does two phase locking solve The dirty read problem?
Answers:a) Yesb) No, because the dirty read
problem is not a consequenceof conflicting operations. Strict Two Phase Lockingprotocol solves the “dirty read”problem
Two Phase Locking: Dirty ReadT1 T2
write_lock(X )read_item ( X )X = X – Nwrite_item (X )write_lock(Y )unlock(X )
read_item ( Y )
Y = Y + Qwrite_item (Y )unlock(Y )//T1 fails before it commits
write_lock(X )read_item X
X = X + Mwrite_item(X )unlock(X )
time
If T1 gets exclusive lockon X first, T2 has to waituntil T1 unlocks X Note that interleaving is still possible, only notwithin the transactionsthat access the samedata items
Two phase locking alone does not solve the dirty read problem, because T2
is allowed to read uncommitteddatabase item X
Strict Two Phase Locking A variant of the two phase locking protocol Protocol:
A transaction T does not release any of exclusive locks until after it commits or aborts
Hence, no other transaction can read or write an item X that is written by T unless T has committed
The strict two phase locking protocol is safe for dirty read Rigorous two-phase locking is even stricter: here all locks
are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.
Most DBMS implement either strict or rigorous two phase locking
Schedule for Strict 2PL with serial executionT1 T2
X(A)R(A)W(A)X(B)R(B)W(B)Commit
X(A)R(A)W(A)X(B)R(B)W(B)Commit
Disadvantages of Locking Pessimistic concurrency control has a number of key disadvantages,
particularly in distributed systems: Overhead. Locks cost, and you pay even if no conflict occurs.
Even readonly actions must acquire locks. High overhead forces careful choices about lock granularity.
Low concurrency. If locks are too coarse, they reduce concurrency unnecessarily. Need for strict 2PL to avoid cascading aborts makes it even worse.
Low availability. A client cannot make progress if the server or lock holder is temporarily unreachable.
Deadlock.
Two phase locking can introduce some undesirable effects. These are:
waits, deadlocks, starvation
Problems with Locking The use of locks and the 2PL protocol prevents many of the problems
arising from concurrent access to the database. However, it does not solve all problems, and it can even introduce new
ones. Firstly, there is the issue of cascading rollbacks:
2PL allows locks to be released before the final commit or rollback of a transaction.
During this time, another transaction may acquire the locks released by the first transaction, and operate on the results of the first transaction.
If the first transaction subsequently aborts, the second transaction must abort since it has used data now being rolled back by the first transaction.
This problem can be avoided by preventing the release of locks until the final commit or abort action.
Cascading roll-back is possible under two-phase locking. Cascade less schedule is not possible
Dead Lock Dead lock is also called deadly embrace Deadlock occurs when two or more transactions reach an impasse
because they are waiting to acquire locks held by each other. Typical sequence of operations is given on the following diagram
T1 T2
write_lock(X )
write_lock(Y )//has to wait
write_lock(Y )
write_lock(X )//has to wait
time
•T1 acquired exclusive lock on X
•T2 acquired exclusive lock on Y
•No one can finish, because both are inthe waiting state
•To handle a deadlock one of T3 or T4 must be rolled back and its locks released.
Dead Lock (continued) Dead lock examples:
a) T1 has locked X and waits to lock Y T2 has locked Y and waits to lock Z T3 has locked Z and waits to lock X
b) BothT1 and T2 have acquired sharable locks on X and wait
to lock X exclusively All that results in a cyclic wait for graph
T1 T2
T2 waits for T1
T1 waits for T2
Starvation Starvation is also possible if concurrency control manager is badly designed.
For example: A transaction may be waiting for an X-lock on an item, while a sequence of other
transactions request and are granted an S-lock on the same item. If T2 has s-lock on data item and T1 request X-lock on same data item. So T1
has to wait for T2 to release S-lock. Meanwhile T3 request S-lock on same data item and lock request is compatible with lock granted to T2 so T3 may be granted S-lock. so now T2 release a lock but still T1 is not granted until T3 finish.
Concurrency control manager can be designed to prevent starvation.
T1 T2 T3 T4lock-S(A)
lock-X(A)
wait
lock-S(A)
lock-S(A)lock-S(A)
T5
Starvation
Starvation is a problem that appears when using locks, or deadlock detection protocols
Starvation occurs when a transaction can not make any progress for an indefinite period of time, while other transactions proceed
Starvation can occur when: Waiting protocol for locked items is unfair (used stacks
instead of queues) The same transaction is selected as `victim` repeatedly
Lock Conversion
A transaction T that already holds a lock on item X can convert it to another state
Lock conversion rules are: T can upgrade a read_lock(X ) to a write_lock(X
) if it is the only one that holds a lock on the item X (otherwise, T has to wait)
T can always downgrade a write_lock(X ) to a read_lock(X )
Lock Conversions Two-phase locking with lock conversions:
For getting more concurrency we used 2PL with lock conversion. – First Phase:
can acquire a lock-S on item can acquire a lock-X on item can convert a lock-S to a lock-X (upgrade) Upgrading possible only in growing phase.
– Second Phase: can release a lock-S can release a lock-X can convert a lock-X to a lock-S (downgrade) Downgrading possible only in shrinking phase.
Lock Conversions
T1 T2
read_item ( a1 )read_item ( a2 )read_item ( a3 )
write_item ( a1 )
read_item (a1)read_item ( a2)P=a1+a2
time
1. Normal 2PL would make T2 wait till the write_item (a1) is executed
2. Lock conversion would allow higher concurrency and allow shared lock to be acquired by T2
3. T1 could upgrade shared lock to exclusive just before the write instruction
Lock Conversions
Transactions attempting to upgrade may need to wait
Lock conversions generate serializable schedules, serialized by their lock points
If exclusive locks are held till the end of the transactions then schedules are cascadeless
Automatic Acquisition of Locks A transaction Ti issues the standard read/write instruction,
without explicit locking calls. The operation read(D) is processed as: if Ti has a lock on D then read(D) else begin if necessary wait until no other transaction has a lock-X on D grant Ti a lock-S on D; read(D) end
Automatic Acquisition of Locks (Cont.)
write(D) is processed as: if Ti has a lock-X on D then write(D) else begin if necessary wait until no other trans. has any lock on D, if Ti has a lock-S on D then upgrade lock on D to lock-X else grant Ti a lock-X on D write(D) end; All locks are released after commit or abort
Implementation of Locking A lock manager can be implemented as a separate
process to which transactions send lock and unlock requests
The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is answered
The lock manager maintains a data-structure called a lock table to record granted locks and pending requests
The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being locked
Lock Management Lock table entry:
Number of transactions currently holding a lock Type of lock held (shared or exclusive) Pointer to queue of lock requests
Locking and unlocking have to be atomic operations
Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock
Lock Table Black rectangles indicate granted locks, white ones indicate waiting requests
Lock table also records the type of lock granted or requested
New request is added to the end of the queue of requests for the data item, and granted if it is compatible with all earlier locks
Unlock requests result in the request being deleted, and later requests are checked to see if they can now be granted
If transaction aborts, all waiting or granted requests of the transaction are deleted lock manager may keep a list of
locks held by each transaction, to implement this efficiently
Granted
Waiting
Other protocols than 2-PL – graph-based - Graph-based protocols are an alternative to two-
phase locking- Assumption: we have prior knowledge about the
order in which data items will be accessed- (hierarchical) ordering on the data items, like, e.g.,
pages of a B-treeA
B C
Graph-Based Protocols
Impose a partial ordering on the set D = {d1, d2 ,..., dh} of all data items. If di dj then any transaction accessing both di and dj
must access di before accessing dj. Implies that the set D may now be viewed as a directed
acyclic graph, called a database graph. The tree-protocol is a simple kind of graph protocol. Transactions access items from the root of this
partial order
Tree Protocol
1. Only exclusive locks are allowed.2. The first lock by Ti may be on any data item. Subsequently, a data Q
can be locked by Ti only if the parent of Q is currently locked by Ti.
3. Data items may be unlocked at any time.4. A data item that has been locked and unlocked by Ti cannot
subsequently be relocked by Ti
Tree protocol - example
T1 T2L(B)
L(D) L(H)
U(D)L(E)U(E)L(D)U(B)
U(H)L(G)U(D)U(G)
G IH
FED
CB
A-2PL?
-follows tree protocol?
-‘correct’?
Graph-Based Protocols (Cont.) The tree protocol ensures conflict serializability as well as freedom from
deadlock.
Advantages of tree protocol over 2 phase locking protocol shorter waiting times, unlocking may occur earlier and increase in concurrency protocol is deadlock-free, no rollbacks are required
Drawbacks Protocol does not guarantee recoverability or cascade freedom
Need to introduce commit dependencies to ensure recoverability If Ti performs a read of uncommitted data item , we record a commit dependency of Ti on the
transaction that performed the last write on the data item. Transactions may have to lock data items that they do not access.
increased locking overhead, and additional waiting time Without prior knowledge of what data items will be locked, transactions may lock the root of
the tree reducing concurrency
Schedules not possible under two-phase locking are possible under tree protocol, and vice versa.
Main application used for locking in B+trees, to allow high-concurrency update access;
otherwise, root page lock is a bottleneck
Timestamp-Based Protocols BASIC TIMESTAMP ORDERING:- Timestamp is unique identifier created by DBMS to identify a transaction. Timestamp values are assigned in order in which transactions are submitted to the
system. Each transaction is issued a timestamp when it enters the system. If an old
transaction Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
The timestamp could use the value of the system clock or a logical counter The protocol manages concurrent execution such that the time-stamps determine
the serializability order. In order to assure such behavior, the protocol maintains for each data item Q two
timestamp values: W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)
successfully. R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully.
Timestamp-Ordering Protocol The timestamp ordering protocol ensures that
any conflicting read and write operations are executed in timestamp order.
Suppose a transaction Ti issues a read(Q)1. If TS(Ti) < W-timestamp(Q), then Ti needs to read a
value of Q that was already overwritten. Hence, the read operation is rejected, and Ti is rolled back.
2. If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).
Timestamp Ordering Protocols (Cont.) Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then abort and rollback T and reject the operation. This should be done because some younger transaction with timestamp greater than TS(T)-after T in timestamp ordering-has already read or write the value of item Q before T had chance to write Q,thus violating the timestamp ordering.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this write operation is rejected, and Ti is rolled back.
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti). (If TS(Ti) > W-timestamp(Q) and TS(Ti) > R-timestamp(Q))
Example of Timestamp Ordering Protocol
T14 T15
read(B)read(B)B:= B-50write(B)
read (A)read(A)
display (A+B)A := A+50write (A)display (A+B)
Timestamp Ordering Protocols (Cont.) The time stamp ordering protocol ensures conflict serializability as well
as freedom from deadlock. However there is a possibility of starvation, if a sequence of conflicting transactions cause repeated restarts
Consider two transactions T1 and T2 shown below.
The write(q) of T1 fails and this rollsback T1 and in effect T2 as T2 is dependent on T1
This protocol can generate schedules that are non recoverable Recoverability and cascadelessness can be ensured by performing all
writes at the end of the transaction and recoverability alone can be ensured by tracking uncommitted dependencies
T1 T2
Read(p)Read(q)
Write(p)
Write(q)
Strict Timestamp ordering Variation of basic TO called strict TO ensures schedules are
both strict(recoverable) and (conflict)serializable. If transaction T issues read_item(X) or write_item(X) such that TS(T)>write_TS(X) has its read or write delayed until transaction T’(TS(T’)=write_TS(X) has commited or aborted.
Thomas’ Write Rule Modified version of the timestamp-ordering protocol in which
obsolete or outdated write operations may be ignored under certain circumstances.
When Ti attempts to write data item Q, if TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete or outdated value of {Q}. Rather than rolling back Ti as the timestamp ordering
protocol would have done, this {write} operation can be ignored.
Otherwise this protocol is the same as the timestamp ordering protocol.
Thomas' Write Rule allows greater potential concurrency. Allows some view-serializable schedules that are not conflict-
serializable.
T1 T2
R(A)
W(A)
commit
W(A)
Commit
View Serializable schedule equivalent to serial schedule <T1,T2>
Validation-Based Protocol (Optimistic method for concurrency control)
Optimistic concurrency control techniques also known as validation or certification methods
This techniques are called optimistic because they assume that conflicts of database operations are rare and hence that there is no need to do checking during transaction execution.
Checking represent overhead during transaction execution, with effect of slowing down the transaction and checking needs to be done before commit
Updates in transaction are not applied directly to d/b item until transaction reaches its end and are carried out in a temporary database file
The three phases of concurrently executing transactions can be interleaved, but each transaction must go through the three phases in that order.
Execution of transaction is done in three phases.
Read and execution phase: Transaction Ti read values of committed items from database. Updates are applied only to temporary local copies of data items / temporary update file.
Validation phase (Certification phase):
Transaction Ti performs a ``validation test'‘ to determine if local copies can be written without violating serializability. If validation test positive then moves to write phase else transactions are discarded
Write phase: If Ti is validated successfully , the updates are applied to the database; otherwise, Ti is rolled back and then restarted. This phase is performed only for Read-write transactions and not for Read-
only transactions
Validation-Based Protocol (Optimistic method for concurrencycontrol)
Optimistic concurrency control Each transaction T is given 3 timestamps:
Start(T): when the transaction starts Validation(T): when the transaction enters the
validation phase Finish(T) : when the transaction finishes
Goal: to ensure the transaction following a serial schedule based on Validation(T)
Optimistic concurrency control Given two transaction T1 and T2 and
Validation(T1) < Validation(T2) Case 1 : Finish(T1) < Start(T2)
Time
Read Valid WriteT1 :
Start(T1) Valid(T1) Finish(T1)
Read Valid WriteT2 :
Start(T2) Valid(T2) Finish(T2)
Here, no problem of serializability
Optimistic concurrency control Case 2 : Finish(T1) < Validation(T2)
Time
Read Valid WriteT1 :
Start(T1) Valid(T1) Finish(T1)
T2 :Read Valid Write
Start(T2) Valid(T2) Finish(T2)
If T2 does not read anything T1 writes, then no problem
Potential conflict
Optimistic concurrency control Case 3 : Validation(T2) < Finish(T1)
Time
Read Valid WriteT1 :
Start(T1) Valid(T1) Finish(T1)
T2 :Read Valid Write
Start(T2) Valid(T2) Finish(T2)
If T2 does not read or writes anything T1 writes, then no problem
Potential conflict
Optimistic concurrency control For any transaction T, check for all
transaction T’ such that Validation(T’) < Validation(T) that
1. If Finish(T’) > Start(T) then if T reads any element that T’ writes, then abort
2. If Finish(T’) > Validation(T) then if T writes any element that T’ writes, then abort
3. Otherwise, commit
Schedule Produced by Validation Example of schedule produced using
validationT14 T15
read(B)read(B)B:= B-50read(A)A:= A+50
read(A)(validate)display (A+B)
(validate)write (B)write (A)
Optimistic concurrency control Advantages:
No blocking No overhead during execution
Do have overhead for validation It is very efficient when conflicts are rare. Occasional conflicts
result is transaction rollback. Rollback involves local copy of data, database is not involved
so there will be no cascading rollback.
Disadvantages: Potential starvation for long transaction Large amount of aborts if high concurrency
Applications of Optimistic methods
Suitable for environments where there are few conflicts and no long transactions
Acceptable for mostly read or query database systems that require very few update transactions
Optimistic Concurrency Control
Granularity of Items Until now, we used the term `data item` without
specifying its exact meaning In the context of the concurrency control, a data
item can be: A field of a database record, A database record, A disk block, A whole file, A whole database
Fine granularity refers to small item sizes and coarse granularity refers to larger item sizes
The coarser the data item granularity is, the lower the degree of concurrency
Granularity of Items (continued) Several trade-offs must be considered before choosing the size of the data-
item As finer data granularity, as higher locking overhead of the DBMS lock
manager (due to many locks and unlocks) If we lock large objects (e.g., Relations)
Need few locks Get low concurrency
If we lock small objects (e.g., tuples,fields) Need more locks (=> overhead higher) Get more concurrency
The best item size depends on the type of a transaction: If a transaction accesses a small number of records, then
data item = record If a transaction accesses a large number of records in the same file, then
data item = file Some DBMS automatically change granularity level with regard to the number
of records a transaction is accessing (attempting to lock)
Multiple Granularity Granule is a unit of data individually controlled by concurrency control
system. Granularity is lockable unit in a lock based concurrency control scheme.
Example : If Ti needs to access the entire database and locking protocol is used, then Ti
must lock each data item in database. If done individually it is time consuming. Better if Ti could issue a single lock request to lock the entire database. If Ti need to access only few data items then should not require to lock entire
database but only lock that data item. Hence, mechanism is required to allow multiple levels of granularity.
Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones
Can be represented graphically as a tree (but don't confuse with tree-locking protocol)
Example of Granularity Hierarchy
If transaction Ti gets an explicit lock on file Fc in exclusive mode, then it has an implicit lock in exclusive mode on all records belonging to this file Fc. .Does not need to lock individual records of Fc explictly.
Database levelEntire databaseTable LevelEntire tablePage levelEntire disk block is locked. A page has fixed size as.4K,8K,16K..Page contain several rows of one or more tables.Most suitable for multi user DBMS.Row levelLock exist for each row in each table of d/b. It improves availability of data.
Management requires high overhead cost.Attribute levelAllow concurrent transactions to access the same row but different attributes. Most
flexible for multi user data access If any transaction wants to access entire database then has to lock entire database or
root of the tree. Question is how does system determine if root node can be locked? One Solution is to search the entire tree which is time consuming which defeats the
purpose of multiple granularity locking scheme.
Multiple Granularity
Multiple Granularity Locking (MGL) Protocol
To make multiple granularity level locking practical, additional locks called intention locks are needed.
Three types of intention locks : Intention Shared (IS) indicates shared locks will be requested
on descendant node Intention Exclusive (IX) indicates exclusive locks will be
requested on descendant node Shared Intention Exclusive (SIX) indicates current node is
locked in Shared mode but exclusive locks will be requested on descendant nodes.
Before locking an item, transaction must set “intention locks” on all its ancestors.
Locking is done “top down” and unlock is done “bottom-up”.
MGL Compatibility Matrix
Requestor
Holder
Multiple Granularity Locking (MGL) Protocol
Lock Compatibility matrix must be adhered The root of the tree must be locked first in any mode A node N can be locked by a transaction T in S or IS mode
only if the parent of node N is already locked by transaction T in either IS or IX mode
A node N can be locked by transaction T in X, IX or SIX mode only if the parent of node N is already locked by transaction T in either IX or SIX mode
A transaction T can lock a node only if it has not unlocked any node (enforce 2PL)
A transaction T can unlock a node N only if none of the children of node N are currently locked.
Examples T1 reads record ra2 in File Fa
Needs to lock database, area A1 and Fa in IS mode Needs to lock ra2 in S mode
T2 modifies record ra9 in file Fa Needs to lock database, area A1 and file Fa in IX mode Needs to lock ra9 in X mode.
T3 reads all records in file Fa Needs to lock database and area A1in IS mode Needs to lock Fa in S mode
T4 reads the entire database Needs to lock the database in S mode.
T1, T3 and T4 can access database concurrently T1 and T2 can access concurrently T2 cannot access concurrently with T3 or T4
Multiversion Schemes Multiversion schemes keep old versions of data item to
increase concurrency. Multiversion Timestamp Ordering Multiversion Two-Phase Locking
Each successful write results in the creation of a new version of the data item written.
Use timestamps to label versions. When a read(Q) operation is issued, select an appropriate
version of Q based on the timestamp of the transaction, and return the value of the selected version.
reads never have to wait as an appropriate version is returned immediately.
Multiversion Timestamp Ordering Each data item Q has a sequence of versions <Q1, Q2,....,
Qm>. Each version Qk contains three data fields: Content -- the value of version Qk. W-timestamp(Qk) -- timestamp of the transaction that created
(wrote) version Qk
R-timestamp(Qk) -- largest timestamp of a transaction that successfully read version Qk
when a transaction Ti creates a new version Qk of Q, Qk's W-timestamp and R-timestamp are initialized to TS(Ti).
R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and TS(Tj) > R-timestamp(Qk).
Multiversion Timestamp Ordering (Cont) Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the version of Q whose write timestamp is the largest write timestamp less than or equal to TS(Ti).1. If transaction Ti issues a read(Q), then the value returned is the
content of version Qk.2. If transaction Ti issues a write(Q)
1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back. 2. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten3. else a new version of Q is created.
Observe that Reads always succeed Transaction is rejected if it is too late in doing write. Conflicts resolved thru rollbacks. Every Read also involves a write Does not ensure recoverability and cascadelessness
Protocol guarantees serializability
Multiversion Two-Phase Locking There are three locking modes for an item:
Read, write and certify In standard locking scheme, once transaction obtain write lock on an item,
no other transactions can access that item. Idea behind multiversion 2PL is to allow other transaction T’ to read an item
X while a single transaction T holds a write lock on X. Accomplished by allowing two version for each item X. one version must be always written by some committed transaction. Second version X’ is created when transaction T acquires write lock on item. Other transaction can continue to read committed version of X while T holds write
lock. When T is ready to commit ,must obtain certify lock on all items that is
currently holds write locks on before it can commit. Certify lock is not compatible with read locks, so transaction may have to
delay until all its write locks are released by reading transactions Once certify locks are acquired, the committed version X of data item is set
to values of version X’ and version X’ is discarded and certify locks are released.
Ensures schedules are recoverable and cascadeless
Deadlock Handling Consider the following two transactions: T1: write (X) T2: write(Y)
write(Y) write(X) Schedule with deadlock
T1 T2
lock-X on Xwrite (X)
lock-X on Ywrite (Y) wait for lock-X on X
wait for lock-X on Y
Deadlock Handling Deadlock prevention protocols ensure that the system will never enter
into a deadlock state. Some prevention strategies : Require that each transaction locks all its data items before it begins
execution (pre-declaration).If any of the items can’t be obtained, none of the items are locked. In other words transaction requesting a new lock is aborted if possibility of deadlock can occur.
Use preemption and transaction rollback. In preemption when T2 request a lock that T1 holds, lock granted to T1 may be preempted by rolling back, and granting of lock to T2.
Allow system to enter into a Deadlock state and then Deadlock Detection and Deadlock Recovery schemes are applied
Both deadlock prevention and Recovery involve rollbacks. Prevention protocols may be used when the probability of a system entering into deadlock is high otherwise Detection and Recovery scheme is more efficient.
Dead Lock Prevention Techniques There are a number of deadlock prevention
techniques These are:
Conservative two phase lock protocol Timestamp techniques:
Wait – Die protocol Wound – Wait protocol
No – Wait protocol
Conservative Two Phase Locking Protocol Conservative two phase locking
A transaction has to lock all items it will access before it begins to execute (as in the ordinary two phase locking),
If it cannot acquire any of its locks, it releases all items, aborts, and tries again,
Deadlock can't occur because no hold - and wait Once it starts, a transaction is always in its shrinking phase
Conservative Two Phase Locking Protocol Problems:
What if a transaction cannot predetermine all items it is going to use? (e.g. a sequence of interactive SQL statements comprising one database transaction)
What if a database item that is already locked by another transaction will be released very soon? (i.e. the transaction is aborted in vain)
Low data-item utilization, i.e. data item may be locked and unused for a long duration
Another variant of this approach Use a total order of data items. Once a particular data item is
locked it cannot request locks on item that precede it This scheme is easy to implement as long as the set of data
items accessed is known.
Using Timestamps (Wait-Die) Timestamps are used together with two phase locking DBMS assigns a timestamp TS to each transaction T entering the
system If Ti starts before Tj , then TS (Ti ) < TS (Tj) (*Ti is older than Tj ) Wait-die scheme — non-preemptive
older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead.
a transaction may die several times before acquiring needed data item
If Ti has higher priority ,allowed to wait, otherwise it is aborted. E.g. T22, T23, T24 have timestamps 5,10,15 If T22 request data item held by T23 then T22 will wait. If T24
requests the data item then it will be rolled back
Using Timestamps (Wound-wait) wound-wait scheme — preemptive :
If a transaction tries to lock an item held by another one, the older transaction wound the younger one (causes abort and restart with the same timestamp),
But the younger one is allowed to wait So, the oldest transaction will be allowed to finish may be fewer rollbacks than wait-die scheme. If Ti has higher priority , abort Tj, otherwise Ti
waits. E.g. T22, T23, T24 have timestamps 5,10,15 If T22 request data item held by T23 then data item will be pre-
empted from T23 and T23 will be rolled back. If T24 requests the data item that T23 is holding then it will wait
Deadlock Prevention Using Timestamps Both schemes avoid starvation, as transaction with smallest timestamp is not asked to rollback.
Transactions that rollback are not assigned new timestamps, so at some point and time a transaction will acquire smallest timestamp value
In wait-die the older transaction waits for the younger transaction to complete, so the older the transaction gets the more it tends to wait
In wait-die if Ti is wounded and rolled back, the chances of it re-issuing the same commands and dying multiple times exist. In contrast such rollbacks may be fewer in wound-wait
Deadlock Prevention (No-wait) Both in wait-die and in wound-wait schemes, a rolled
back transactions is restarted with its original timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided.
Timeout-Based Schemes : a transaction waits for a lock only for a specified amount of
time. After that, the wait times out and the transaction is rolled back.
thus deadlocks are not possible simple to implement; but starvation is possible. Also difficult
to determine good value of the timeout interval.
Deadlock Detection Schemes
Deadlock prevention is justified if transactions are long and use many items, or transaction load is very heavy
In many practical situations it is advantageous to do no deadlock prevention but to detect dead locks and then abort at least one of the transactions involved
An algorithm that examines the state of the system is invoked periodically to detect deadlocks
If deadlocks are found then system must attempt to recover from the deadlock
To do this system requires Knowledge of current data items requested by the transactions
and the outstanding data item request An algorithm to determine deadlock condition A recovery process to recover from the deadlock
Deadlock Detection Deadlock detection using wait-for-graph protocol:
Construct a wait-for graph where each transaction has its node If Ti waits on Ti, construct a directed edge from Ti to Tj
If there is a cycle detected, select a `victim` and abort it Victim selecting algorithm should select and abort transactions
that made the least number of updates
TIMEOUT protocol: If a transaction waits longer than a specified amount of time, it
gets aborted Here, deadlock is only supposed, not proved
Deadlock Detection (Wait-for graph) Deadlocks can be described as a wait-for graph, which
consists of a pair G = (V,E), V is a set of vertices (all the transactions in the system) E is a set of edges; each element is an ordered pair Ti Tj.
If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when Tj is no longer holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look for cycles.
Deadlock Detection (Cont.)
Wait-for graph without a cycle Wait-for graph with a cycle
Deadlock Detection (Wait-for graph) When should we invoke deadlock detection
algorithm? Depends on how often the deadlock occurs How many transactions will be affected by the deadlock
If deadlock occurs frequently the invoke the algorithm more frequently.
Data items allocated to deadlocked transactions are unavailable to other transactions till the deadlock is broken.
In worst case invoke on every request that needs to wait
Deadlock Recovery When deadlock is detected :
Select a victim : Some transaction will have to be rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost. Factors that determine cost of rollback include How long the transaction has computed and how much
longer will it compute before it completes the designated task
How many data items have been used and how many more are required
How many transactions will be involved in the rollback
Deadlock Recovery When deadlock is detected (contd.):
Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart it. Partial rollback :More effective to roll back transaction
only as far as necessary to break deadlock. This will require maintaining additional information about the state of all the transactions. The sequence of lock request granted and updates performed.
Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation
Other Concurrency Control Issues Till now we have restricted our attention to read and write. This limits
transactions to data items already exisitng. But transaction can create and delete data items and can affect the concurrency control of the transaction
We will now see concurrency issues related to
Insert
Delete
Phantom Records
Other Concurrency Control Issues Delete Operation
Deletion of a data item conflicts with read and write operations as, if we read or write a deleted item, it will result in error
Hence in 2PL an exclusive lock is required on a data item before it can be deleted
In Timestamp ordering protocol, If TS(Ti) < R-timestamp(Q), then Ti is deleting a value that is already read by transaction Tj, where Ti< Tj. Hence delete is rejected and rolled back.
If TS(Ti) < W-timestamp(Q), then Ti is deleting a value that is already written by transaction Tj, where Ti < Tj. Hence delete is rejected and rolled back
Insert Operation Insertion of a data item conflicts with read and write operations as, no read
or write can be performed before the item is inserted In 2PL, An insert operation may be performed only if the transaction
inserting the tuple has an exclusive lock on the tuple to be inserted. In timestamp ordering protocol, after the insertion the R-timestamp and W-
timestamp values of the data item are set to TS(Ti)
Other Concurrency Control Issues Phantom Record
A transaction locks database items that satisfy certain selection condition and updates them
During that update, another transaction inserts a new item that satisfies the same selection condition
After the update, but inside the same transaction, we suddenly discover the existence of a database item that has not been updated although it should have been (since it satisfies the selection condition) This database item, called a “phantom record”, appeared because it
did not exist when locking has been done
Other Concurrency Control Issues Phantom Record
E.g. T1 : Select sum(salary) from emp where d_no = 5 T2 : insert (emp_id, d_no, emp_name, salary)
into emp values (25,5,’xyz’,4500) Let S be a schedule involving T1, T2. Although the two transactions do
not access any tuples in common, yet they conflict on a phantom tuple If concurrency control is done at tuple level, this conflict can go
undetected and the system could fail to prevent a non-serializable schedule. This phenomenon is called “Phantom phenomenon”
To prevent the phantom phenomenon T1 should prevent other transactions from creating new tuples in emp table where d_no=5
Hence it is not enough to lock tuples that are accessed, but we also need to lock the information used to find the tuples that are accesses
To manage this index locking technique is used. Any transaction that inserts a tuple into a relation should insert information into each of the indices and locking is done on the indices as well. This eliminates the phantom phenomenon.
B+-Tree For account File with n = 3.
Insertion of “Clearview” Into the B+-Tree of Figure 16.21
Index Locking Protocol Index locking protocol:
Every relation must have at least one index. A transaction can access tuples only after finding them through
one or more indices on the relation A transaction Ti that performs a lookup must lock all the index
leaf nodes that it accesses, in S-mode Even if the leaf node does not contain any tuple satisfying the index
lookup (e.g. for a range query, no tuple in a leaf is in the range) A transaction Ti that inserts, updates or deletes a tuple ti in a
relation r must update all indices to r must obtain exclusive locks on all index leaf nodes affected by the
insert/update/delete The rules of the two-phase locking protocol must be observed
Guarantees that phantom phenomenon won’t occur
Concurrency Control in Indexes 2PL locking can be applied but would mean holding the locks till the
shrinking phase and this would be expensive Crabbing Protocol
Takes the advantage of B-tree structure of index Index search always traverses from the root to a leaf When searching acquire shared lock on the node. After acquiring a lock on the child node,
release the lock on the parent as the parent node will not be required any further. When an insertion or deletion is being done, follow the same mechanism as search and
obtain and release shared locks, except the parent. Acquire an exclusive lock on the leaf node to insert or delete key values. If node is not full no
change is required. If node is full upgrade the parent lock to exclusive.
B-Link Protocol Sibling nodes at each level are linked For Lookup / Search, shared locks are requested and before accessing the child node the
lock on the parent is released For an insert or delete operation, proceed as in lookup and upgrade the shared lock to
exclusive for the leaf node. If split relock parent in the exclusive mode. If split occurs with a search concurrently, locate
thru the right sibling
Transaction Support in SQL-92 Higher level of consistency allow programmers to ignore
concurrency issues, whereas weaker levels of consistency places additional burden on the programmers to maintain consistency
NoNoNoSerializable
MaybeNoNoRepeatable Reads
MaybeMaybeNoRead Committed
MaybeMaybeMaybeRead Uncommitted
Phantom Problem
Unrepeatable Read
DirtyRead
Isolation Level
Weak Levels of Consistency in SQL SQL allows non-serializable executions
Serializable: is the default Repeatable read: allows only committed records to be read,
and repeating a read should return the same value (so read locks should be retained) However, the phantom phenomenon need not be prevented
T1 may see some records inserted by T2, but may not see others inserted by T2
Read committed: same as degree two consistency, but most systems implement it as cursor-stability
Read uncommitted: allows even uncommitted data to be read In many database systems, read committed is the default
consistency level has to be explicitly changed to serializable when required
set isolation level serializable