storage structures indexing techniques in dbs - …€” effectiveness = performance ......
TRANSCRIPT
2
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Goal and influence factorsPhysical Design: Goal and influence factors
Physical schema design goals— Effective data organization— Effective access to disc— Effectiveness = performance
Quality Measures— Throughput: # transactions / sec— Turn-around-time:
time for answering an individual query (e.g. average)
Parameters— Application dependent factors: size of database, typical
operations, frequency of operations, isolation level— System related factors: storage of data, access path, index
structures, optimization, …
3
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage DevicesPhysical Design: Storage Devices
Memory Hierarchy:
Cache
Main memory
Disk
Tertiary storage
Primary storage
Secondary storage
Archive storage
DBMS
4
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage DevicesPhysical Design: Storage Devices
Cache— Fastest, most costly form of storage— Size very small— Usage managed by OS/Hardware
Main Memory— Data available to be operated on— Too small for entire DB— Data lost after power failure or a system crash
Cache
Main memory
Disk
Tertiary storage
DBMS
5
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage DevicesPhysical Design: Storage Devices
Disk— Where entire DB is stored— Direct access (not sequential) — Data operations: disk -> main memory -> disk— (Usually) survives power failures/system crashes
Tape storage (tertiary storage)— Cheaper than disk— slower access: tape read sequentially from the beginning
(sequential-access storage).— Backups and archival data— Used for recovery from disk failures— Less complex than disk => more reliable
Cache
Main memory
Disk
Tertiary storage
DBMS
6
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage DevicesPhysical Design: Storage Devices
Access time vs capacity:
Tertiary storage1312111098765
2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9
Secondary storage
Zip disk
Floppy disk
Main memory
Cache
Source: Garcia-Molina, Ullman, Widom “Database systems”, 2002
Capa
city
in
10Y
Byte
s
Access timein 10X sec
7
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Secondary storagePhysical Design: Secondary storage
Mechanics
Platter = 2 surfaces
Disk heads Cylinder
trackgap
sector
8
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Secondary storagePhysical Design: Secondary storage
“Typical” Characteristics— Rotation: 5400 –10,000 RPM — Capacity: 360 KB – 100+ GB— Price (per GB): $2 - $2.5— Platter diameter: 2.5 to 25 cm — Platters: 1 - 15 — Tracks (per surface): 40 – 20,000— Sector size: 512 B – 50 KB— Sectors (per track): 32-500
Holds:Virtual memory (handled by OS)File system (access via OS)DBMS stuff (bypass OS)
9
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Secondary storagePhysical Design: Secondary storage
Block:— Neighboring byte sequence on single track of one platter— Smallest data unit moved between disk and main memory— Databases work with blocks— Fixed size — Size determined by physical properties of disk + OS— Size: from 512 to 4096 bytes
trackgap
sector
blocks
10
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
Data transfer disk - main memory — In blocks— Bytes transferred at constant speed— Information flow (if): between 120 Kb/s and 5 Mb/s
Seek time:Time for repositioning the arm to right cylinderMove disk heads to the right cylinder:Start (constant), Move (variable), Stop (constant)0 if arm there, otherwise long (between 8 to 10 ms)Track-to-track seek time: 0.5ms –2ms
11
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
Rotate time (disk latency):— Time until interesting sector positioned under the head— Access to all data within a cylinder within rotate time — 12 to 6 ms per rotation— Average: 6 to 3 ms rotational latency.— => store related information close by
Transfer time (read time):Depends on # bytes to be transferred
Seek time + Rotational time + T/if
Total time to transfer T bytes:
12
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
Typical access time:
Seek time dominates !
Disk access time = SeekTime 6 ms+ RotateTime 3 ms+ TransferTime 1 ms
Compare: RAM 3-10 nsec
Random Disk / RAM:~10 * 10-3 / 10 * 10-9 = 106
~ Distance Hamilton - moon vs. distance Hamilton – Raglan
Sequential disk read ("scan") much faster
13
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
Disk characteristics:
1.2 days2 hrs52002003
5 hrs20 min50181998
20 min2 min20,0000.251988
ScanRandom
ScanSequential
$/GBCapacityGB
year
Source: The Rebirth of Database Machines; talk by Bitton/Gray at VLDB'98
Random access favorable only if small percentage of the data accessed frequentlyRule of thumb: less than 15 % on a large table
14
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
The Myth: seek time dominates The Reality: 1. Queuing dominates2. Transfer dominates BLOBs3. Disk seeks often short
Implication: many cheap servers better than one fast expensive server— Shorter queues— Parallel transfer— Lower cost/access and cost/byte
Seek
Rotate
Transfer
Wait(Queuing)
Source: “Storage: Alternate Futures”Storage: Alternate Futures”; talk by J.Gray at NetStore99, Seattle
15
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: I/O costPhysical Design: I/O cost
Goal: Accelerate access to secondary storage
Strategies1. Place blocks that are accessed together on same cylinder
(avoids seek time)2. Divide data between smaller disks
(independent heads increase # block accesses)3. Mirror a disk: simultaneous access to several blocks4. Disk-scheduling algorithm: selects order of block access 5. Prefetch blocks in main memory
Strategy depends on situation/application# processes, parallel access, shared disks, (un)predictable queries, budget,…
16
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: RAIDPhysical Design: RAID
RAID Technology— Redundant array of independent disks; originally redundant
array of inexpensive disks — Reduces access time and queue length— Fault tolerance by "Parity disks"— Principle technique: striping
Large disk:Long queue, Long transfer
A B C D EF G H
Striping (interleaving)A E B F C G D H
RAID 0:
17
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: RAIDPhysical Design: RAID
RAID levels:
RAID 1 Mirroring and Duplexing: mirror without stripping
RAID 0+1 High Data Transfer Performance
A E B F C G D HA E B F C G D H= = = =
A B C D E F G HA B C D E F G H= = = =
18
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: RAIDPhysical Design: RAID
RAID levels:
A0 B0 A1 B1 A2 B2 A3 B3 EA EB
RAID 2 Byte (Bit) level striping + error correcting disks
(A0..A3) = word A(B0..B3) = word B
RAID 3 Block stripping with parity
A0 B0 A1 B1 A2 B2 A3 B3 PA PB
Byte level
stripe 0 stripe 1 stripe 2 stripe 3
19
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: RAIDPhysical Design: RAID
RAID levels:
RAID 5 Independent Data disks with distrib. parity blocks
A0 P2 B0 B2 P0 C2A1 P1 C1
A block B block C block
RAID 4 Independent Data disks with shared Parity disk
A0 B0 A1 B1 A2 B2 A3 B3 PA PB
block 0 block 1 block 2 block 3
Block level
20
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: RAIDPhysical Design: RAID
Levels 0, 1, 2, 3, 4, 5, 6, 7, 10, 1+0, 53
RAID controller provides OS / DBS with standard disk interface
Performance:— gains for read operations— Writes need recomputation of parity: Main cause of parity
disk bottleneck in RAID-4 architecture
Further info: http://www.raid.com
21
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB dataPhysical Design: Storage of DB data
Database data — Stored in disk files — Files provided by the underlying OS
File: — Logical sequence of records— Records mapped onto disk blocks
Records — Composed of fields (byte sequences for attributes)— Fixed or variable length— BLOBs stored separately in pool of disk blocks— Record points to BLOB
22
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB dataPhysical Design: Storage of DB data
Goal: Mapping the database to files
Problem:— Blocks with fixed size but varying record sizes
(tuples of distinct relations of different size)
Approaches:1. Use several files and store records of only one fixed
length in any given file2. Structure the files such that they can deal with multiple
lengths for records (more difficult than fixed length).
23
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Oracleblock
Database
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
Oracle storage hierarchy
Segment
Extent
Tablespace
Physical
Data file
O/S block
Logical
24
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
Database block
Control of space usage in blocks:PCTUSED and PCTFREE
Header− contains the data block address,
table directory, row directory, …− grows from the top down.
Data space− inserted into from the bottom up
Free space− middle of the block− initially contiguous− may be fragmented
25
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
PCTUSED: defines minimum percentage of used space the server tries to maintain for each data block
Inserts
1
40%
Example:PCTUSED=40
26
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
PCTFREE: specifies percentage of space in each data block reserved for growth resulting from updates to rows in the block
Inserts
1
PCTFREE=20
40%
Inserts
2
80%
Example:PCTUSED=40
27
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
Example:— 3: block data percentage sinks below 100%-PCTFREE— 4: block data percentage sinks below PCTUSED
PCTUSED=40 PCTFREE=20
InsertsInserts80%
40%
3 4
28
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
Row structure:
Database block
Row headerColumn length
Column value
29
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle
Row access:
header, row directory, variable length rows
free space
Disadvantage: uniform page address space -> large page number
(Pagenumber, offset): physical pointers RowId: OOOOOO FFF BBBBBB RRR:
Segment No, data file No (in table space) Block in file, row in block
30
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Physical Design: Buffer ManagementBuffer Management
Database Buffer: — Main memory part for storage copies of disk blocks— (older) block copies always available on disk — Managed by the buffer manager.
Buffer manager:— Allocation of space in main memory — Intercepts all system requests for blocks of the DB
Goal:— Minimize number of blocks to access— Keep as many blocks as possible in main memory
31
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Physical Design: Buffer ManagementBuffer Management
Strategies:— Replacement strategy:
No room left in the buffer: make some by removing blocks
— Pinned blocks: (Important) blocks not allowed to be written back are pinned
— Forced output of blocks:make sure to write a block back to the disk
Buffer vs disk— Buffer: faster— Disk: safer
32
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Physical data modelPhysical Design: Physical data model
Physical data model — Mapping logical design structures onto files— Usually: each relation in a separate file
Large scale DB: — Usually do not rely on OS for file management— One (large) OS file is assigned to the DBS— All relations in this file
Organizing records into blocksBlock should contain related records (rows)Interesting to access several records using 1 block accessRandom assignment: different block for each recordHeap (unsorted) or sequential file (sorted)
33
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design: Physical data modelPhysical Design: Physical data model
Example:
...2.00Lucas 1999SciFiStar Wars I345
...2.00Lucas1997ScFiStar Wars IV290
...2.20Van Sant1998suspensePsycho222
... 1.50 Spielberg1982comedyET112
...1.50Spielberg 1975horrorJaws100
.....................
...2.00Hitchcock1960suspensePsycho095
Sequential file processing following pointers After several inserts/deletes:
Order doesn't match physical order of the recordsSequential processing not efficient
→ Reorganize the file
34
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Goal: Accelerate access to secondary storage
Index: — Optional data structure for fast access to individual data
items in the DB — Defined for one or more attributes of a table— Locates table-rows in an efficient way
Goal of DB designer: — Define index for schema to increase performance— Use one of the implementations supplied by DBS
Goal of DBS implementer: — Find efficient data structure for index
35
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Indexing technique depends on application
Criteria:— Access time— Insertion / deletion time (data + structure update)— Space overhead: Space occupied by index structure
Trade off — High #indexes: DB updates to costly — Low #indexes: no efficient data search
Efficiency measure: # blocks to access on disk
36
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Search key: — Attribute set used to look up records in file— Each index associated with particular search key— Fast random access to records in a file
CBA
Index
CBBAA
C
A
Records
Search key
37
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Primary (unique) index— Index whose search key specifies sequential order in file— Maps key values to physical locations— Usually corresponds to primary key— ∀ values in attribute A: at most one row r with r.A=value— Such files: index-sequential files
Example:
...2.00Lucas 1999SciFiStar Wars I345
...2.00Lucas1997ScFiStar Wars IV290
...2.20Van Sant1998suspensePsycho222
... 1.50 Spielberg1982comedyET112
...1.50Spielberg 1975horrorJaws100
.....................
...2.00Hitchcock1960suspensePsycho095
112100095
345290222
38
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Secondary index— Every other not primary index— In most cases not unique — ∀ value in attribute A: one or more rows r with r.A=value
Example:
SciFihorrorcomedy
suspense
112
222095
......
1999SciFiStar Wars I345
1997ScFiStar Wars IV290
1998suspensePsycho222
1982comedyET112
1975horrorJaws100
............
1960suspensePsycho095
39
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: BasicsStructures: Basics
Dense Index— Index record for every search
key value in the file— Faster to locate a record than sparse index
CBBAA
C
A
CBA
CBBAA
C
A
CA
Sparse IndexIndex records for only some of the recordsAccess: Find less or equal index value, follow the pointersLess space and easier to maintain than dense index
Trade-of access time and space overheadCompromise: sparse index with one entry per block
40
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index typesStructures: Index types
Index Trees— Tree defines search way to attribute values
Hash Index— Uses hash functions to encode attribute values
Bitmap Index— Bits indicate the attribute values
Cluster Index— Store "logically related data" in physical neighborhood
Important task in physical schema design:Deciding which indexes to create (= which index types on which attributes)
41
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: ISAMStructures: ISAM
ISAM (Index-sequential Access method) — Index and data ordered according search key— "sequential“: rows may be read in key sequence
— Keys Ki, Data tuple Di, Pi pointer to data Dj: Ki-1< Dj.key ≤Ki
— Performance degrades as the file grows
K2K1 … Kn Index pages
Data pages
D2D1 … Di-1 Di Di+2Di+1 … Dn-1 Dn
......
P1 P2 Pn
42
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
B+ -Trees ( ~ B* - Trees)
— Standard for most DBS— Like B-Trees, inner nodes contain only keys and pointers
— maintains its efficiency despite insertion/deletion of data
K..K1 … K..
K..K.. … K.. K..K.. … K.......
K..K.. … K....... K..K.. … K.. .....
data pointer
(Knuth, D. The Art of Computer Programming, vol. 3. Sorting and Searching. Addison-Wesley, Menlo Park, CA., 1973)
Important concept
43
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
Self-organized balanced tree Every path from root to a leaf is of same length
Degree/order n: — All nodes have min n, max 2n keys, — Root nodes have max 2n keys— Every node with m keys has m+1 pointers
Alternative definitions:[Elmasri/Navathe]: nodes have ⎡k/2⎤ … k keys[Garcia-Molina et al]: nodes have ⎡(n+1)/2⎤ … n+1 pointers
B+ = nodes min 1/2 full , B* = nodes min 2/3 full
44
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
Structure of a inner node:— 2n search-key values K1, K2,…, K2n,
— 2n+1 pointers P1, P2,…, P2n+1
Search-key values sorted: if i < j, then Ki < Kj
Pi points to sub-tree with Keys X:— Ki-1 ≤ X < Ki (1< i < 2n+1)— X ≤ Ki (i=1), Ki-1 < X (i = 2n+1)
K2K1 K3 K4
P1 P2 P3 P4 P5n=2
45
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
Node: same size as a disk block (10 ≤ 2n ≤ 100)
Chained leaves provide sequential accessStructure of leaf node: — Pi (1 ≤ i ≤ 2n) points to file record with search-key value Ki
K2K1 K3 K4
P1 P2 P3 P4...... ...
...
...K3......
46
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
Example:— Degree 2 B* Index Tree on Movie
222112 290 345
112
10095
...2.00Lucas 1999SciFiStar Wars I345
...2.00Lucas1997ScFiStar Wars IV290
...2.20Van Sant1998suspensePsycho222
... 1.50 Spielberg1982comedyET112
...1.50Spielberg 1975horrorJaws100
.....................
...2.00Hitchcock1960suspensePsycho095
47
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees
Non-unique index (if search key is not primary key)
Different implementations:
— Leaf node pointers Pi (1 ≤ i ≤ 2n, ) point to a bucket of pointers pointing to a file record with search-key value Ki
— Oracle implementation: key values repeated for each row-id with this key
— DB2:key value followed by list of row-ids
48
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Search algorithmSearch algorithm
Search in unique tree: one path from root to a leaf
Start: root node already in main memory
Lookup Algorithm:
lookup(page p, item x)if p is a leaf pageif x==p.item[k] return p.pointer[k]else return NULL
else if x<p.item[1] return lookup(p.pointer[1], x)
else if for some j p.item[j] <= x < p.tem[j+1] return lookup(p.pointer[j+1], x)
else return lookup(p.pointer[2n+1], x)
49
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion
Insert in unique tree: values only once inserted
1. Node does not become too large:
Insertion Algorithm: — Locate the leaf page where the item should appear— If leaf page is not full: include item in the page
222112 290 345
112
10095 101
50
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion
2. Node does become too large:
If the leaf page has 2n+1 items after insertion then — create new page with n(+1) items, leave other with n items— insert the pointer of the new page and the first item in the
parent directory
260112
222112
10095 101
290260 345
51
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion
Apply splitting recursively if necessaryEventually split root and create one more level
112101
222112
10095 290260
340302102101 105
260
345302
400345 401
52
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion
1. Case: No Combining Pages
if leaf page B where item x appears has >= n+1 items— if x is not the first in the page, simply delete it— else find the parent of the leaf page where x appears and
update it
260112
222112
10095 101
290260 345
53
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion
2. Case: Transferring Items From Siblings
if the leaf page B where x appears has n itemsif there is a neighbor B’ with >n items— then transfer the first (or the last item) of B’ to B and
update the appropriate ancestors of B
260101
222101
10095 101
290260
54
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion
2. Case: Transferring Items From Siblings
if the leaf page B where x appears has n itemsif there is a neighbor B’ with >n items — then … (see previous page)— else combine B with a neighbor B’, delete from parent of B
the first item of the rightmost of B and B’
101
10095 222101 290
Deletion often not fully implemented
55
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: B+ Structures: B+ --Trees ImplementationsTrees Implementations
Oracle:— Standard index form ( called B-tree index)— Syntax: CREATE [ UNIQUE ]INDEX <index_name>
ON <tablename> (<attribute_list>)[ TABLESPACE <tablespace> ][ PCTFREE integer ]
[...]
CREATE [UNIQUE] INDEX <index_name>ON <tablename> (attribute_list>)
MySQL:Standard index formSyntax:
56
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves
Leaf of B+ index: additional I/O to access row
rowId ........
Header
rowId rowId
Key value
......
Non key values....
Key value
Data leaf: row in leaf node instead of row-id
57
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves
Key value not repeated in row dataNo pointer (row-id) in leaf pages
Advantages:— Space reduction— Very good performance if key is long and row is short to
medium, otherwise frequent splits
Primary key indexes: Leaves containing row dataSecondary indexes: Use primary key as "pointer"
58
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves
Oracle:Called index organized tables (IOT)
CREATE TABLE <tablename><tabledefinition>ORGANIZATION INDEX TABLESPACE <tablespace_name>;
LOBs supported, not LONGLONG and LOB supported
Unique constraints not allowedUnique Constraints allowed
Primary key based accessRow-id based access
Logical row-idPhysical row-id
Primary key is identifierRow-id is identifier
IOTTable
Differences to regular table:
MySQL: not supported
59
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Hash indexStructures: Hash index
Apply hash function on search key h:K→BB: numbering of n buckets , |K| >> |B|Assign keys to buckets according h(K)
Advantage: Efficient access, if inserts infrequent
Record with key value Ki
1
2
0Buckets(blocks)
hash function hK1
K2K3
K4
60
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Hash indexStructures: Hash index
Disadvantages
No sequential scanStatic: No dynamic increase of spaceFull buckets → pointers to overflow buckets
1
2
0
K1
K2K3
K4
Point queries efficientRange queries inefficientNon unique index: retrieval has to scan rehash chain
⇒ Most DBS don't use simple hash
61
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index
Dynamic growth of hash tablePrinciple idea:— h(K) is n-bit Bit-string, n large, e.g. 32 bits— Not all bits used for addressing buckets, e.g., 232 Blocks— Use as many bits as necessary to address used blocks— Use Bits right or left ordered (here right-ordered)
***.. .. * 0
***.. .. * 1
1 Bit used
1Global depth Local depth
Directory
1
110
1
1111001
Important concept
Records
62
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index
Indirect addressing of blocks by hash valuesand directory (= pointer array PA)PA may grow, doubles each timeSame block may have more than one pointer
Row access(key K)— Take last i Bits of h(K), i= global depth— PA[bi...b1] is Pointer to block— Access block, search sequentially for row
One I/O if directory in main memory already
63
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index
Example— (Primary) Extensible Hash Index on Movie— h(id)=id mod 7, bucket size = 3
0 0
0 1
2
1 0
1 1
2...2.00Hitchcock1960suspensePsycho095
... 1.50 Spielberg1982comedyET112
...2.00Lucas 1999SciFiStar Wars I345
...1.50Spielberg 1975horrorJaws100
...2.20Van Sant1998suspensePsycho222
...2.00Lucas1997ScFiStar Wars IV290
2
1
64
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index
Hash key inserts:— Split buckets and increase local depth if necessary— Increase global depth if necessary
Hash key deletionsDelete empty buckets Shrink directory if all local depths smaller than global depth
Variant: ‘start buckets’ not deleted or mergedVariant: Merge 2 neighboring buckets if content fits in one, neighbors: same local depth, similar hash values
65
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Bitmap IndexStructures: Bitmap Index
Useful if few different values in a large table
......... ............
.....................
......... ............
... ... ...............
......... ............
.....................
.....................
......... ............
.....................
......... ............
... ... ...............
......... ............
.....................
.....................
File 3, block 10-12:
<Blue , 10.0.3, 12.8.3, 10001001000100><Green , 10.0.3, 12.8.3, 00010000100010> <Red , 10.0.3, 12.8.3, 01000100010000><Yellow, 10.0.3, 12.8.3, 00100010001001>
KeystartROWID
endROWID bitmap
Bitmap index:
66
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Bitmap IndexStructures: Bitmap Index
Efficient implementation of set operations
Example:
<Blue, 10.0.3, 12.8.3, 10001001000100 ><Red, 10.0.3, 12.8.3, 00010000100010 >
<female, 10.0.3, 12.8.3, 01010110100011 >
<RESULT 00010000100010 >
OR
) AND
SELECT x,y,z FROM people WHERE (color = 'Blue' OR color = 'Red' )AND sex = ‘f'
(
67
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Bitmap IndexStructures: Bitmap Index
Advantage— If few values and many rows e.g. sex, marital status,…— Compression of bit lists saves space— Efficient processing of OR / AND queries
Disadvantage— Updates expensive:
bitmaps and data blocks must be locked— B+ -tree : one row is locked during update
Oracle Syntax: CREATE BITMAP INDEX <indexname>
ON <tablename (<attribute>)>[TABLESPACE <tablesspacename>] [PCTFREE integer];
68
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: FunctionStructures: Function--based indexesbased indexes
Indexes functions and expressionsUseful if operation performed frequently
Example:— Indexes on from_date,until_date not sufficient
Oracle Syntax:
SELECT mem_noFROM rental WHERE DAYS_BETWEEN(from_date,until_date)>10;
CREATE INDEX <indexname> ON <tablename> (<function>);
69
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: ClusteringStructures: Clustering
Principle: put related data into a group (a cluster)Goal: efficient access to related ("clustered") data
Typical statistical techniqueNo statistics available during DB design
Important concept
Two approaches:Clustering of heterogeneous objects (table clustering)Index clustering
Be careful with different cluster notions
70
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Table ClusteringStructures: Table Clustering
Tables frequently accessed together (join)
Example: — Access to a Movie record is often followed by access to a
tape containing this movie:
— Tape- and movie records with the same movie_id should be placed in one block
SELECT t.id, t.format, m.id, m.titleFROM Tape t, Movie mWHERE m.id = t.movie_id;
Cluster: set of blocks with interleaving (related) rows of more than one table
71
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Table ClusteringStructures: Table Clustering
Example:— Cluster key: movie_id
...2.00Hitchcock1960suspensePsycho095
... 1.50 Spielberg1982comedyET112
...1.50Spielberg 1975horrorJaws100
...2.00Lucas 1999SciFiStar Wars I345
...2.20Van Sant1998suspensePsycho222
...2.00Lucas1997ScFiStar Wars IV290
DVD0001
DVD0002
VHS0003
VHS0009
VHS0005
DVD0004
72
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Table ClusteringStructures: Table Clustering
Clusters are defined by common cluster keyCluster key often primary / foreign keyNeed to chain the records, otherwise more block accesses required than unclustered
Table cluster definition:— First create a cluster
— Then create the tables in the cluster
CREATE CLUSTER mtc(id NUMBER);
CREATE TABLE Movie(…) CLUSTER mtc(id);CREATE TABLE Tape(…) CLUSTER mtc(movie_id);
73
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index ClusteringStructures: Index Clustering
Row-ids in a leaf page normally not orderedSequential index scan means random access to rows
Example: Heap Storage, index without cluster
...
Leaf nodes
Root
Datarecords
... ...
K..K.. … K..
K..K.. … K..K..K.. … K..K..K.. … K.. ...
......
74
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Index ClusteringStructures: Index Clustering
Clustering index:— Physical order on non-key attribute— Obvious: only one cluster per table— Faster access to rows with same value in cluster attribute
Example:
...... ...
K..K.. … K..
K..K.. … K..K..K.. … K..K..K.. … K.. ...
......
75
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
IndexIndex Structures: Clustering implementationStructures: Clustering implementation
Oracle:— Table cluster supported— Index cluster supported (only hash index)
CREATE INDEX <indexname>ON <clustername> [...];
Postgres:— Table cluster not supported — Index cluster supportedCLUSTER <indexname> ON <table>
MySQL: table and index cluster not supported
76
Dr
A. Hinze, U
niversity of Waikato, N
ew Zealand,2006, CO
MP 329 :D
atabase Administration
Physical Design + Index Structures: SummaryPhysical Design + Index Structures: Summary
Data stored on diskAccess time crucial in querying— I/Os is most important cost measure— Access Time: Seek time + Rotational time + Transfer time
Indexes accelerate access to secondary storage — B+ tree is standard in most DBs— Often extensible hashing supported— Clustering: related data in physical neighborhood
Great differences in physical organization in DBSIndexing not standardized