cs 405g: introduction to database systems 20. concurrent control, storage

26
CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Upload: katherine-harrington

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

CS 405G: Introduction to Database Systems

20. Concurrent control, Storage

Page 2: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 2

Conflicting operations

Two operations on the same data item conflict if at least one of the operations is a write r(X) and w(X) conflict w(X) and r(X) conflict w(X) and w(X) conflict r(X) and r(X) do not r/w(X) and r/w(Y) do not

Order of conflicting operations matters E.g., if T1.r(A) precedes T2.w(A), then conceptually, T1

should precede T2

Page 3: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 3

Locking

Rules If a transaction wants to read an object, it must first request a

shared lock (S mode) on that object If a transaction wants to modify an object, it must first request

an exclusive lock (X mode) on that object Allow one exclusive lock, or multiple shared locks

Mode of lock(s)currently held

by other transactions

Mode of the lock requested

Grant the lock?

Compatibility matrix

S X

S Y N

X N N

Page 4: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 4

Basic locking is not enough

T1 T2

r(A)w(A)

r(A)w(A)

r(B)w(B)

r(B)w(B)

lock-X(A)

lock-X(B)

unlock(B)

unlock(A)lock-X(A)

unlock(A)

unlock(B)lock-X(B)

Possible scheduleunder locking

But still notconflict-

serializable!

T1

T2

Read 100

Write 100+1

Read 101

Write 101*2

Read 100

Write 100*2

Read 200

Write 200+1

Add 1 to both A and B(preserve A=B)

Multiply both A and B by 2(preserves A=B)

A B!

Page 5: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 5

Two-phase locking (2PL)

All lock requests precede all unlock requests Phase 1: obtain locks, phase 2: release locks

T1 T2

r(A)w(A)

r(A)w(A)

r(B) w(B) r(B)w(B)

lock-X(A)

lock-X(B)

unlock(B)

unlock(A)lock-X(A)

lock-X(B)

Cannot obtain the lock on Buntil T1 unlocks

T1 T2

r(A)w(A)

r(A)w(A)

r(B)w(B) r(B) w(B)

2PL guarantees aconflict-serializable

schedule

Page 6: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 6

Problem of 2PL

T2 has read uncommitted data written by T1

If T1 aborts, then T2 must abort as well

Cascading aborts possible if other transactions have read data written by T2

Even worse, what if T2 commits before T1? Schedule is not recoverable if the system crashes right

after T2 commits

T1 T2

r(A)w(A)

r(A)w(A)

r(B)w(B) r(B) w(B)

Abort!

Page 7: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 7

Strict 2PL

Only release locks at commit/abort time A writer will block all other readers until the writer

commits or aborts

Used in most commercial DBMS

Page 8: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Deadlock Detection

Create and maintain a “waits-for” graph Periodically check for cycles in graph

04/20/23 Chen Qian @ University of Kentucky 8

Page 9: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Deadlock Detection (Continued)

Example:

T1: S(A), S(D), S(B)T2: X(B) X(C)T3: S(D), S(C), X(A)T4: X(B)

T1 T2

T4 T3

Deadlock!

04/20/23 Chen Qian @ University of Kentucky 9

Page 10: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Deadlock Prevention

Assign priorities based on timestamps. Say Ti wants a lock that Tj holds

Two policies are possible:Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts

Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits

04/20/23 Chen Qian @ University of Kentucky 10

Page 11: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 11

Storage

It’s all about disks! That’s why we always draw databases as And why the single most important metric in database

processing is the number of disk I/Os performed

Page 12: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

The Storage Hierarchy

Source: Operating Systems Concepts 5th Edition

Main memory (RAM) for currently used dataDisk for the main database (secondary storage).Tapes for archiving older versions of the data (tertiary storage).

Smaller, Faster

Bigger, Slower

04/20/23 12Chen Qian @ University of Kentucky

Page 13: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 13

Jim Gray’s Storage Latency Analogy: How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

10 6

Lexington

This Lecture Hall

This RoomMy Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 Years

Andromeda

Page 14: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 14

A typical disk

Spindle rotation

Sp

ind

le

Platter

Tracks

Arm movement

Disk arm

Disk headCylinders

“Moving parts” are slow

Page 15: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 15

Disk access time

Sum of: Seek time: time for disk heads to move to the correct

cylinder Rotational delay: time for the desired block to rotate

under the disk head Transfer time: time to read/write data in the block (=

time for disk to rotate over the block)

Page 16: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 16

Random disk access

Seek time + rotational delay + transfer time Average seek time

Time to skip one half of the cylinders? Not quite; should be time to skip a third of them (why?)

“Typical” value: 5 ms Average rotational delay

Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM)

Typical transfer time .08msec per 8K block

Page 17: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 17

Sequential Disk Access Improves Performance

Seek time + rotational delay + transfer time Seek time

0 (assuming data is on the same track) Rotational delay

0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk

access!

Page 18: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 18

Performance tricks

Disk layout strategy Keep related things (what are they?) close together: same

sector/block ! same track ! same cylinder ! adjacent cylinder

Double buffering While processing the current block in memory, prefetch

the next block from disk (overlap I/O with processing) Disk scheduling algorithm Track buffer

Read/write one entire track at a time Parallel I/O

More disk heads working at the same time

Page 19: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 19

Records and Files

A row in a table, when located on disks, is called A record

Two types of record: Fixed-length Variable-length

Page 20: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

04/20/23 Chen Qian @ University of Kentucky 20

In an abstract sense, a file is A set of “records” on a disk

In reality, a file is A set of disk pages

Each record lives on A page

Physical Record ID (RID) A tuple of <page#, slot#>

Page 21: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Files

Higher levels of DBMS operate on records, and files of records.

FILE: A collection of pages, containing a collection of records. Record = row in a table

Must support: insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records

to be retrieved)

04/20/23 21Chen Qian @ University of Kentucky

Page 22: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Unordered (Heap) Files

Simplest file structure contains records in no particular order.

As file grows and shrinks, disk pages are allocated and de-allocated.

To support record level operations, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page

There are many alternatives for keeping track of this. We’ll consider 2

04/20/23 22Chen Qian @ University of Kentucky

Page 23: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Heap File Implemented as a List

The header page id and Heap file name must be stored someplace. Database “catalog”

Each page contains 2 `pointers’ plus data.

HeaderPage

DataPage

DataPage

DataPage

DataPage

DataPage

DataPage Pages with

Free Space

Full Pages

04/20/23 23Chen Qian @ University of Kentucky

Page 24: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Heap File Using a Page Directory

DBMS must remember where the first directory page is located. The entry for a page can include the number of free bytes on the

page. The directory itself is a collection of pages and shown as a

linked list. Much smaller than linked list of all HF pages!

DataPage 1

DataPage 2

DataPage N

HeaderPage

DIRECTORY

04/20/23 24Chen Qian @ University of Kentucky

Page 25: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Trade-off

Compared to linked list, the directory based approaches have:

Pros: Fast access to a particular record Cons: Extra space

04/20/23 Chen Qian @ University of Kentucky 25

Page 26: CS 405G: Introduction to Database Systems 20. Concurrent control, Storage

Summary

Disks provide cheap, non-volatile storage. Random access, but cost depends on the location of page

on disk; important to arrange data sequentially to minimize seek and rotation delays.

04/20/23 26Chen Qian @ University of Kentucky