c20.0046: database management systems lecture #26
DESCRIPTION
C20.0046: Database Management Systems Lecture #26. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Previously: Indices Next: Finish Indices, advanced indices Failure/recovery Data warehousing & mining Websearch Hw3 due today no extensions! 1-minute responses - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/1.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
1
C20.0046: Database Management SystemsLecture #26
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2004
![Page 2: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/2.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
2
Agenda Previously: Indices Next:
Finish Indices, advanced indices Failure/recovery Data warehousing & mining
Websearch
Hw3 due today no extensions!
1-minute responses
Review: clustered, dense, primary, #/tbl, syntax
![Page 3: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/3.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
3
Query compiler/optimizer
Execution engine
Index/record mgr.
Buffer manager
Storage manager
storage
User/Application
Queryupdate
Query executionplanRecord,
indexrequests
Page commands
Read/writepages
Transaction manager:•Concurrency control•Logging/recovery
Transactioncommands
Let’s get physical
![Page 4: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/4.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
4
BSTs Very simple data structure in CS: BSTs
Binary Search Trees Keep balanced Each node ~ one item
Each node has two children: Left subtree: < Right subtree: >=
Can search, insert, delete in log time log2(1MB = 220) = 20
![Page 5: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/5.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
5
Search for DBMS Big improvement: log2(1MB) = 20
Each op divides remaining range in half! But recall: all that matters is #disk accesses
20 is better than 220 but:
Can we do better?
![Page 6: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/6.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
6
BSTs B-trees Like BSTs except each node ~ one block Branching factor is >> 2
Each access divides remaining range by, say, 300 B-trees = BSTs + blocks B+ trees are a variant of B-trees
Data stored only in leaves Leaves form a (sorted) linked list Better supports range queries
Consequences: Much shorter depth Many fewer disk reads Must find element within node Trades CPU/RAM time for disk time
![Page 7: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/7.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
7
B+ Trees Parameter n branching factor is n+1
Largest number s.t. one block can contain n search-key values and n+1 pointers
Each node (except root) has at least n/2 keys
30 120 240
Keys k < 30Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k
40 50 60
40 50 60
Next leaf
![Page 8: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/8.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
8
Searching a B+ Tree Exact key values:
Start at the root If we’re in leaf, walk through its key values; If not, look at keys K1..Kn
If Ki <= K <= Ki+1, look in child i
Range queries: As above Then walk left until test fails
Select nameFrom peopleWhere age = 25
Select nameFrom peopleWhere age = 25
Select nameFrom peopleWhere 20 <= age and age <= 30
Select nameFrom peopleWhere 20 <= age and age <= 30
![Page 9: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/9.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
9
B+ Tree Example80
20 60 100 120
140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
n = 4Find the key 40
40 80
20 < 40 60
30 < 40 40
NB: Leaf keys are sorted; data pointed to is only if clustered
![Page 10: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/10.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
Clustered & unclustered B-trees
Data entries(Index File)
(Data file)
Data Records
Data entries
Data Records
CLUSTERED UNCLUSTERED
![Page 11: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/11.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
11
B+ trees, and, or Assume index on a,b,c
Intuition: phone book
WHERE a = ‘x’ and b = ‘y’
WHERE b = ‘y’ and c = ‘z’
WHERE a = ‘a’ and c = ‘z’
WHERE a = ‘x’ or b = ‘y’ or c = ‘z’
![Page 12: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/12.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
12
B+ trees and LIKE Supports only hard-coded prefix LIKE checks
Intuition: phone book
Select * from T where a like ‘xyz%’
Select * from T where a like ‘%xyz’
Select * from T where a like ‘xyz%zyx%’
![Page 13: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/13.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
13
B-tree search efficiency With params:
block=4k integer = 4b, pointer = 8b
the largest n satisfying 4n+8(n+1) <= 4096 is n=340 Each node has 170..340 keys assume on avg has (170+340)/2=255
Then: 255 rows depth = 1 2552 = 64k rows depth = 2 2553 = 16M rows depth = 3 2554 = 4G rows depth = 4
![Page 14: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/14.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
14
B-trees in practice Most DBMSs use B-trees for most indices
Default in MySQL Default in Oracle
Speeds up where clauses Some like checks Min or max functions joins
Limitation: fields used must Be a prefix of indexed fields Be ANDed together
![Page 15: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/15.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
15
Next topic: Advanced types of indices Spatial indices based on R-trees (R = region)
Support multi-dimensional searches on “geometry” fields
2-d not 1-d ranges
Oracle:
MySQL:
CREATE INDEX geology_rtree_idx ON geology_tab(geometry) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
CREATE INDEX geology_rtree_idx ON geology_tab(geometry) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g));CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g));
![Page 16: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/16.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
16
Advanced types of indices Inverted indices for web doc search
First, think of each webpage as a tuple One column for every possible word True means the word appears on the page Index on all columns
Now can search: you’re fired select * from T where youre=T and fired=T
![Page 17: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/17.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
17
Advanced types of indices Can simplify somewhat:
1. For each field index, delete False entries
2. True entries for each index become a bucket
Create “inverted index”: One entry for each search word Search word entry points to corresponding
bucket Bucket points to pages with its word Amazon
![Page 18: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/18.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
18
Advanced types of indices Function-based indices
Speeds up WHERE upper(name)=‘BUSH’, etc.
Now supported in Oracle 8, not MySQL
Bitmap indices Speeds up arbitrary combination of reqs
Not limited to prefixes or conjunctions Now supported in Oracle 9, not MySQL
create index on T(my_soundex(name));create index on T(substr(DOB),4,5));
create index on T(my_soundex(name));create index on T(substr(DOB),4,5));
![Page 19: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/19.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
19
Bitmap indices Assume table has n records Assume F is a field with m different values Bitmap index on F: m length-n bitstrings
One bitstring for each value of F Each one says which rows have that value for F
Example: n = , mF = , mG =
Q: find rows whereF=50 or (F=30 and G=‘Baz’)
F G
1 30 Foo
2 30 Bar
3 40 Baz
4 50 Foo
5 40 Bar
6 30 Baz
![Page 20: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/20.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
20
Bitmap index search Larger example: (age,salary) of jewelry buyers:
Bitmaps for age: 25:100000001000, 30:000000010000, 45:01000000100,
50:001110000010, 60:000000000001, 70:000001000000, 85:000000100000
Bitmaps for salary: 60:110000000000, 75:001000000000, 100:000100000000,
110:000001000000, 120:000010000000, 140:000000100000, 260:000000010001, 275:000000000010, 350:000000000100, 400:000000001000
Age Sal.
5 50 120
6 70 110
7 85 140
8 30 260
Age Sal.
9 25 400
10 45 350
11 50 275
12 50 260
Age Sal.
1 25 60
2 45 60
3 50 75
4 50 100
![Page 21: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/21.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
21
Bitmap index search Query: find buyers of age 45-55 with salary
100-200 Age range: 010000000100 (45) |
001110000010 (50) = 011110000110 Bitwise or of Salary range: 000111100000 AND together: 011110000110 &
000111100000 = 000110000000
What does this mean?
![Page 22: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/22.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
22
Bitmap index search Once we have row numbers, then what?
Get rows with those numbers (How?)
Bitmap indices in Oracle:
Best for low-cardinality fields Boolean, enum, gender
lots of 0s in our bitmaps Compress: 000000100001 6141
“run-length encoding”
CREATE BITMAP INDEX ON T(F,G);CREATE BITMAP INDEX ON T(F,G);
![Page 23: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/23.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
23
New topic: Recovery
Type of Crash Prevention
Wrong data entryConstraints andData cleaning
Disk crashesRedundancy:
e.g. RAID, archive
Fire, theft, bankruptcy…
Buy insurance, Change jobs…
System failures:e.g. blackout
DATABASERECOVERY
![Page 24: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/24.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
24
System Failures Each transaction has internal state When system crashes, internal state is lost
Don’t know which parts executed and which didn’t Remedy: use a log
A file that records each action of each xact Trail of breadcrumbs
![Page 25: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/25.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
25
Media Failures Rule of thumb: Pr(hard drive has head crash
within 10 years) = 50% Simpler rule of thumb: Pr(hard drive has head
crash within 1 years) = 10% Serious problem
Soln: different RAID strategies RAID: Redundant Arrays of Independent
Disks
![Page 26: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/26.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
26
RAID levels RAID level 1: each disk gets a mirror RAID level 4: one disk is xor of all others
Each bit is sum mod 2 of corresponding bits E.g.:
Disk 1: 11110000 Disk 2: 10101010 Disk 3: 00111000 Disk 4:
How to recover?
![Page 27: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/27.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
27
Transactions Transaction: unit of code to be executed
atomically In ad-hoc SQL
one command = one transaction In embedded SQL
Transaction starts = first SQL command issued Transaction ends =
COMMIT ROLLBACK (=abort)
Can turn off/on autocommit
![Page 28: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/28.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
28
Primitive operations of transactions Each xact reads/writes rows or blocks: elms
INPUT(X) read element X to memory buffer
READ(X,t) copy element X to transaction local variable t
WRITE(X,t) copy transaction local variable t to element X
OUTPUT(X) write element X to disk
LOG RECORD
![Page 29: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/29.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
29
Transaction example Xact: Transfer $100 from savings to checking
A = A+100; B = B-100;
READ(A,t);
t := t+100;
WRITE(A,t);
READ(B,t);
t := t-100;
WRITE(B,t)
![Page 30: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/30.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
30
Transaction example READ(A,t); t := t+100;WRITE(A,t); READ(B,t); t := t-100;WRITE(B,t)
Action t Mem A Mem B Disk A Disk B
INPUT(A) 1000 1000 1000
READ(A,t) 1000 1000 1000 1000
t:=t+100 1100 1000 1000 1000
WRITE(A,t) 1100 1100 1000 1000
INPUT(B) 1100 1100 1000 1000 1000
READ(B,t) 1000 1100 1000 1000 1000
t:=t-100 900 1100 1000 1000 1000
WRITE(B,t) 900 1100 900 1000 1000
OUTPUT(A) 900 1100 900 1100 1000
OUTPUT(B) 900 1100 900 1100 900
![Page 31: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/31.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
31
The log An append-only file containing log records Note: multiple transactions run concurrently,
log records are interleaved After a system crash, use log to:
Redo some transaction that didn’t commit Undo other transactions that didn’t commit
Three kinds of logs: undo, redo, undo/redo We’ll discuss only Undo
![Page 32: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/32.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
32
Undo Logging Log records <START T>
transaction T has begun <COMMIT T>
T has committed <ABORT T>
T has aborted <T,X,v>
T has updated element X, and its old value was v
![Page 33: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/33.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
33
Undo-Logging Rules U1: Changes logged (<T,X,v>) before being
written to disk U2: Commits logged (<COMMIT T>) after being
written to disk
Results: May forget we did whole xact (and so wrongly undo) Will never forget did partial xact (and so leave)
Log-change, change, log-change, change, Commit, log-commit
![Page 34: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/34.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
34
Action T Mem A Mem B Disk A Disk B Log
<START T>
READ(A,t) 1000 1000 1000 1000
t:=t+100 1100 1000 1000 1000
WRITE(A,t) 1100 1100 1000 1000 <T,A,8>
READ(B,t) 1000 1100 1000 1000 1000
t:=t-100 900 1100 1000 1000 1000
WRITE(B,t) 900 1100 900 1000 1000 <T,B,8>
OUTPUT(A) 900 1100 900 1100 900
OUTPUT(B) 900 1100 900 1100 900
COMMIT
<COMMIT T>
Undo-Logging e.g. (inputs omitted)
![Page 35: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/35.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
35
Recovery with Undo Log After system’s crash, run recovery manager
1. Decide for each xact T whether it was completed
2. Undo all modifications from incomplete xacts, in reverse order (why?) and abort each
<START T>….<COMMIT T> yes<START T>….<ABORT T> yes<START T>…………………… no
<START T>….<COMMIT T> yes<START T>….<ABORT T> yes<START T>…………………… no
![Page 36: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/36.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
36
Recovery with Undo Log Read log from the end; cases:
<COMMIT T>: mark T as completed <ABORT T>: mark T as completed <T,X,v>:
<START T>: ignore
if T is not completed thenwrite X=v to disk
elseignore
if T is not completed thenwrite X=v to disk
elseignore
![Page 37: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/37.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
37
Recovery with Undo Log……<T2,X2,v2>……<START T5><START T4><T1,X1,v1><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
Q: Which updates areundone?
Crash!
Start:
![Page 38: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/38.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
38
Recovery with Undo Log Note: undo commands are idempotent
No harm done if we repeat them Q: What if system crashes during recovery?
How far back in the log do we go? Don’t go all the way back to the start May be very large
Better idea: use checkpointing
![Page 39: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/39.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
39
Checkpointing Checkpoint the database periodically
Stop accepting new transactions Wait until all current xacts complete Flush log to disk Write a <CKPT> log record, flush log Resume accepting new xacts
![Page 40: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/40.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
40
Undo Recovery with Checkpointing
……<T1,X1,v1>……(all completed)<CKPT><START T2><START T3<START T5><START T4><T4,X4,v4><T5,X5,v5><T4,X4,v4><COMMIT T5><T3,X3,v3><T2,X2,v2>
During recovery,can stop at first<CKPT>
xacts T2,T3,T4,T5
other xacts
![Page 41: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/41.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
41
Non-quiescent Checkpointing Problem: database must freeze during
checkpoint Would like to checkpoint while database is
operational Idea: non-quiescent checkpointing
Quiescent: quiet, still, at rest; inactive
![Page 42: C20.0046: Database Management Systems Lecture #26](https://reader035.vdocument.in/reader035/viewer/2022062409/56815152550346895dbf7748/html5/thumbnails/42.jpg)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
42
Next time Next: Data warehousing mining! For next time: reading online
Proj5 due next Thursday no extensions!
Now: one-minute responses Relative weight: warehousing, mining, websearch Data mining techniques
NNs GAs kNN Decision Trees