storage structures indexing techniques in dbs - …€” effectiveness = performance ......

76
1 Physical Schema Design Physical Schema Design Storage structures Indexing techniques in DBS

Upload: hatuong

Post on 09-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Physical Schema DesignPhysical Schema Design

Storage structuresIndexing techniques in DBS

2

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Goal and influence factorsPhysical Design: Goal and influence factors

Physical schema design goals— Effective data organization— Effective access to disc— Effectiveness = performance

Quality Measures— Throughput: # transactions / sec— Turn-around-time:

time for answering an individual query (e.g. average)

Parameters— Application dependent factors: size of database, typical

operations, frequency of operations, isolation level— System related factors: storage of data, access path, index

structures, optimization, …

3

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage DevicesPhysical Design: Storage Devices

Memory Hierarchy:

Cache

Main memory

Disk

Tertiary storage

Primary storage

Secondary storage

Archive storage

DBMS

4

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage DevicesPhysical Design: Storage Devices

Cache— Fastest, most costly form of storage— Size very small— Usage managed by OS/Hardware

Main Memory— Data available to be operated on— Too small for entire DB— Data lost after power failure or a system crash

Cache

Main memory

Disk

Tertiary storage

DBMS

5

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage DevicesPhysical Design: Storage Devices

Disk— Where entire DB is stored— Direct access (not sequential) — Data operations: disk -> main memory -> disk— (Usually) survives power failures/system crashes

Tape storage (tertiary storage)— Cheaper than disk— slower access: tape read sequentially from the beginning

(sequential-access storage).— Backups and archival data— Used for recovery from disk failures— Less complex than disk => more reliable

Cache

Main memory

Disk

Tertiary storage

DBMS

6

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage DevicesPhysical Design: Storage Devices

Access time vs capacity:

Tertiary storage1312111098765

2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9

Secondary storage

Zip disk

Floppy disk

Main memory

Cache

Source: Garcia-Molina, Ullman, Widom “Database systems”, 2002

Capa

city

in

10Y

Byte

s

Access timein 10X sec

7

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Secondary storagePhysical Design: Secondary storage

Mechanics

Platter = 2 surfaces

Disk heads Cylinder

trackgap

sector

8

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Secondary storagePhysical Design: Secondary storage

“Typical” Characteristics— Rotation: 5400 –10,000 RPM — Capacity: 360 KB – 100+ GB— Price (per GB): $2 - $2.5— Platter diameter: 2.5 to 25 cm — Platters: 1 - 15 — Tracks (per surface): 40 – 20,000— Sector size: 512 B – 50 KB— Sectors (per track): 32-500

Holds:Virtual memory (handled by OS)File system (access via OS)DBMS stuff (bypass OS)

9

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Secondary storagePhysical Design: Secondary storage

Block:— Neighboring byte sequence on single track of one platter— Smallest data unit moved between disk and main memory— Databases work with blocks— Fixed size — Size determined by physical properties of disk + OS— Size: from 512 to 4096 bytes

trackgap

sector

blocks

10

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

Data transfer disk - main memory — In blocks— Bytes transferred at constant speed— Information flow (if): between 120 Kb/s and 5 Mb/s

Seek time:Time for repositioning the arm to right cylinderMove disk heads to the right cylinder:Start (constant), Move (variable), Stop (constant)0 if arm there, otherwise long (between 8 to 10 ms)Track-to-track seek time: 0.5ms –2ms

11

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

Rotate time (disk latency):— Time until interesting sector positioned under the head— Access to all data within a cylinder within rotate time — 12 to 6 ms per rotation— Average: 6 to 3 ms rotational latency.— => store related information close by

Transfer time (read time):Depends on # bytes to be transferred

Seek time + Rotational time + T/if

Total time to transfer T bytes:

12

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

Typical access time:

Seek time dominates !

Disk access time = SeekTime 6 ms+ RotateTime 3 ms+ TransferTime 1 ms

Compare: RAM 3-10 nsec

Random Disk / RAM:~10 * 10-3 / 10 * 10-9 = 106

~ Distance Hamilton - moon vs. distance Hamilton – Raglan

Sequential disk read ("scan") much faster

13

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

Disk characteristics:

1.2 days2 hrs52002003

5 hrs20 min50181998

20 min2 min20,0000.251988

ScanRandom

ScanSequential

$/GBCapacityGB

year

Source: The Rebirth of Database Machines; talk by Bitton/Gray at VLDB'98

Random access favorable only if small percentage of the data accessed frequentlyRule of thumb: less than 15 % on a large table

14

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

The Myth: seek time dominates The Reality: 1. Queuing dominates2. Transfer dominates BLOBs3. Disk seeks often short

Implication: many cheap servers better than one fast expensive server— Shorter queues— Parallel transfer— Lower cost/access and cost/byte

Seek

Rotate

Transfer

Wait(Queuing)

Source: “Storage: Alternate Futures”Storage: Alternate Futures”; talk by J.Gray at NetStore99, Seattle

15

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: I/O costPhysical Design: I/O cost

Goal: Accelerate access to secondary storage

Strategies1. Place blocks that are accessed together on same cylinder

(avoids seek time)2. Divide data between smaller disks

(independent heads increase # block accesses)3. Mirror a disk: simultaneous access to several blocks4. Disk-scheduling algorithm: selects order of block access 5. Prefetch blocks in main memory

Strategy depends on situation/application# processes, parallel access, shared disks, (un)predictable queries, budget,…

16

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: RAIDPhysical Design: RAID

RAID Technology— Redundant array of independent disks; originally redundant

array of inexpensive disks — Reduces access time and queue length— Fault tolerance by "Parity disks"— Principle technique: striping

Large disk:Long queue, Long transfer

A B C D EF G H

Striping (interleaving)A E B F C G D H

RAID 0:

17

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: RAIDPhysical Design: RAID

RAID levels:

RAID 1 Mirroring and Duplexing: mirror without stripping

RAID 0+1 High Data Transfer Performance

A E B F C G D HA E B F C G D H= = = =

A B C D E F G HA B C D E F G H= = = =

18

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: RAIDPhysical Design: RAID

RAID levels:

A0 B0 A1 B1 A2 B2 A3 B3 EA EB

RAID 2 Byte (Bit) level striping + error correcting disks

(A0..A3) = word A(B0..B3) = word B

RAID 3 Block stripping with parity

A0 B0 A1 B1 A2 B2 A3 B3 PA PB

Byte level

stripe 0 stripe 1 stripe 2 stripe 3

19

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: RAIDPhysical Design: RAID

RAID levels:

RAID 5 Independent Data disks with distrib. parity blocks

A0 P2 B0 B2 P0 C2A1 P1 C1

A block B block C block

RAID 4 Independent Data disks with shared Parity disk

A0 B0 A1 B1 A2 B2 A3 B3 PA PB

block 0 block 1 block 2 block 3

Block level

20

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: RAIDPhysical Design: RAID

Levels 0, 1, 2, 3, 4, 5, 6, 7, 10, 1+0, 53

RAID controller provides OS / DBS with standard disk interface

Performance:— gains for read operations— Writes need recomputation of parity: Main cause of parity

disk bottleneck in RAID-4 architecture

Further info: http://www.raid.com

21

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB dataPhysical Design: Storage of DB data

Database data — Stored in disk files — Files provided by the underlying OS

File: — Logical sequence of records— Records mapped onto disk blocks

Records — Composed of fields (byte sequences for attributes)— Fixed or variable length— BLOBs stored separately in pool of disk blocks— Record points to BLOB

22

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB dataPhysical Design: Storage of DB data

Goal: Mapping the database to files

Problem:— Blocks with fixed size but varying record sizes

(tuples of distinct relations of different size)

Approaches:1. Use several files and store records of only one fixed

length in any given file2. Structure the files such that they can deal with multiple

lengths for records (more difficult than fixed length).

23

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Oracleblock

Database

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

Oracle storage hierarchy

Segment

Extent

Tablespace

Physical

Data file

O/S block

Logical

24

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

Database block

Control of space usage in blocks:PCTUSED and PCTFREE

Header− contains the data block address,

table directory, row directory, …− grows from the top down.

Data space− inserted into from the bottom up

Free space− middle of the block− initially contiguous− may be fragmented

25

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

PCTUSED: defines minimum percentage of used space the server tries to maintain for each data block

Inserts

1

40%

Example:PCTUSED=40

26

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

PCTFREE: specifies percentage of space in each data block reserved for growth resulting from updates to rows in the block

Inserts

1

PCTFREE=20

40%

Inserts

2

80%

Example:PCTUSED=40

27

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

Example:— 3: block data percentage sinks below 100%-PCTFREE— 4: block data percentage sinks below PCTUSED

PCTUSED=40 PCTFREE=20

InsertsInserts80%

40%

3 4

28

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

Row structure:

Database block

Row headerColumn length

Column value

29

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Storage of DB data Physical Design: Storage of DB data -- OracleOracle

Row access:

header, row directory, variable length rows

free space

Disadvantage: uniform page address space -> large page number

(Pagenumber, offset): physical pointers RowId: OOOOOO FFF BBBBBB RRR:

Segment No, data file No (in table space) Block in file, row in block

30

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Physical Design: Buffer ManagementBuffer Management

Database Buffer: — Main memory part for storage copies of disk blocks— (older) block copies always available on disk — Managed by the buffer manager.

Buffer manager:— Allocation of space in main memory — Intercepts all system requests for blocks of the DB

Goal:— Minimize number of blocks to access— Keep as many blocks as possible in main memory

31

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Physical Design: Buffer ManagementBuffer Management

Strategies:— Replacement strategy:

No room left in the buffer: make some by removing blocks

— Pinned blocks: (Important) blocks not allowed to be written back are pinned

— Forced output of blocks:make sure to write a block back to the disk

Buffer vs disk— Buffer: faster— Disk: safer

32

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Physical data modelPhysical Design: Physical data model

Physical data model — Mapping logical design structures onto files— Usually: each relation in a separate file

Large scale DB: — Usually do not rely on OS for file management— One (large) OS file is assigned to the DBS— All relations in this file

Organizing records into blocksBlock should contain related records (rows)Interesting to access several records using 1 block accessRandom assignment: different block for each recordHeap (unsorted) or sequential file (sorted)

33

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design: Physical data modelPhysical Design: Physical data model

Example:

...2.00Lucas 1999SciFiStar Wars I345

...2.00Lucas1997ScFiStar Wars IV290

...2.20Van Sant1998suspensePsycho222

... 1.50 Spielberg1982comedyET112

...1.50Spielberg 1975horrorJaws100

.....................

...2.00Hitchcock1960suspensePsycho095

Sequential file processing following pointers After several inserts/deletes:

Order doesn't match physical order of the recordsSequential processing not efficient

→ Reorganize the file

34

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Goal: Accelerate access to secondary storage

Index: — Optional data structure for fast access to individual data

items in the DB — Defined for one or more attributes of a table— Locates table-rows in an efficient way

Goal of DB designer: — Define index for schema to increase performance— Use one of the implementations supplied by DBS

Goal of DBS implementer: — Find efficient data structure for index

35

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Indexing technique depends on application

Criteria:— Access time— Insertion / deletion time (data + structure update)— Space overhead: Space occupied by index structure

Trade off — High #indexes: DB updates to costly — Low #indexes: no efficient data search

Efficiency measure: # blocks to access on disk

36

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Search key: — Attribute set used to look up records in file— Each index associated with particular search key— Fast random access to records in a file

CBA

Index

CBBAA

C

A

Records

Search key

37

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Primary (unique) index— Index whose search key specifies sequential order in file— Maps key values to physical locations— Usually corresponds to primary key— ∀ values in attribute A: at most one row r with r.A=value— Such files: index-sequential files

Example:

...2.00Lucas 1999SciFiStar Wars I345

...2.00Lucas1997ScFiStar Wars IV290

...2.20Van Sant1998suspensePsycho222

... 1.50 Spielberg1982comedyET112

...1.50Spielberg 1975horrorJaws100

.....................

...2.00Hitchcock1960suspensePsycho095

112100095

345290222

38

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Secondary index— Every other not primary index— In most cases not unique — ∀ value in attribute A: one or more rows r with r.A=value

Example:

SciFihorrorcomedy

suspense

112

222095

......

1999SciFiStar Wars I345

1997ScFiStar Wars IV290

1998suspensePsycho222

1982comedyET112

1975horrorJaws100

............

1960suspensePsycho095

39

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: BasicsStructures: Basics

Dense Index— Index record for every search

key value in the file— Faster to locate a record than sparse index

CBBAA

C

A

CBA

CBBAA

C

A

CA

Sparse IndexIndex records for only some of the recordsAccess: Find less or equal index value, follow the pointersLess space and easier to maintain than dense index

Trade-of access time and space overheadCompromise: sparse index with one entry per block

40

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index typesStructures: Index types

Index Trees— Tree defines search way to attribute values

Hash Index— Uses hash functions to encode attribute values

Bitmap Index— Bits indicate the attribute values

Cluster Index— Store "logically related data" in physical neighborhood

Important task in physical schema design:Deciding which indexes to create (= which index types on which attributes)

41

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: ISAMStructures: ISAM

ISAM (Index-sequential Access method) — Index and data ordered according search key— "sequential“: rows may be read in key sequence

— Keys Ki, Data tuple Di, Pi pointer to data Dj: Ki-1< Dj.key ≤Ki

— Performance degrades as the file grows

K2K1 … Kn Index pages

Data pages

D2D1 … Di-1 Di Di+2Di+1 … Dn-1 Dn

......

P1 P2 Pn

42

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

B+ -Trees ( ~ B* - Trees)

— Standard for most DBS— Like B-Trees, inner nodes contain only keys and pointers

— maintains its efficiency despite insertion/deletion of data

K..K1 … K..

K..K.. … K.. K..K.. … K.......

K..K.. … K....... K..K.. … K.. .....

data pointer

(Knuth, D. The Art of Computer Programming, vol. 3. Sorting and Searching. Addison-Wesley, Menlo Park, CA., 1973)

Important concept

43

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

Self-organized balanced tree Every path from root to a leaf is of same length

Degree/order n: — All nodes have min n, max 2n keys, — Root nodes have max 2n keys— Every node with m keys has m+1 pointers

Alternative definitions:[Elmasri/Navathe]: nodes have ⎡k/2⎤ … k keys[Garcia-Molina et al]: nodes have ⎡(n+1)/2⎤ … n+1 pointers

B+ = nodes min 1/2 full , B* = nodes min 2/3 full

44

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

Structure of a inner node:— 2n search-key values K1, K2,…, K2n,

— 2n+1 pointers P1, P2,…, P2n+1

Search-key values sorted: if i < j, then Ki < Kj

Pi points to sub-tree with Keys X:— Ki-1 ≤ X < Ki (1< i < 2n+1)— X ≤ Ki (i=1), Ki-1 < X (i = 2n+1)

K2K1 K3 K4

P1 P2 P3 P4 P5n=2

45

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

Node: same size as a disk block (10 ≤ 2n ≤ 100)

Chained leaves provide sequential accessStructure of leaf node: — Pi (1 ≤ i ≤ 2n) points to file record with search-key value Ki

K2K1 K3 K4

P1 P2 P3 P4...... ...

...

...K3......

46

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

Example:— Degree 2 B* Index Tree on Movie

222112 290 345

112

10095

...2.00Lucas 1999SciFiStar Wars I345

...2.00Lucas1997ScFiStar Wars IV290

...2.20Van Sant1998suspensePsycho222

... 1.50 Spielberg1982comedyET112

...1.50Spielberg 1975horrorJaws100

.....................

...2.00Hitchcock1960suspensePsycho095

47

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees

Non-unique index (if search key is not primary key)

Different implementations:

— Leaf node pointers Pi (1 ≤ i ≤ 2n, ) point to a bucket of pointers pointing to a file record with search-key value Ki

— Oracle implementation: key values repeated for each row-id with this key

— DB2:key value followed by list of row-ids

48

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Search algorithmSearch algorithm

Search in unique tree: one path from root to a leaf

Start: root node already in main memory

Lookup Algorithm:

lookup(page p, item x)if p is a leaf pageif x==p.item[k] return p.pointer[k]else return NULL

else if x<p.item[1] return lookup(p.pointer[1], x)

else if for some j p.item[j] <= x < p.tem[j+1] return lookup(p.pointer[j+1], x)

else return lookup(p.pointer[2n+1], x)

49

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion

Insert in unique tree: values only once inserted

1. Node does not become too large:

Insertion Algorithm: — Locate the leaf page where the item should appear— If leaf page is not full: include item in the page

222112 290 345

112

10095 101

50

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion

2. Node does become too large:

If the leaf page has 2n+1 items after insertion then — create new page with n(+1) items, leave other with n items— insert the pointer of the new page and the first item in the

parent directory

260112

222112

10095 101

290260 345

51

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– Data insertion Data insertion

Apply splitting recursively if necessaryEventually split root and create one more level

112101

222112

10095 290260

340302102101 105

260

345302

400345 401

52

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion

1. Case: No Combining Pages

if leaf page B where item x appears has >= n+1 items— if x is not the first in the page, simply delete it— else find the parent of the leaf page where x appears and

update it

260112

222112

10095 101

290260 345

53

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion

2. Case: Transferring Items From Siblings

if the leaf page B where x appears has n itemsif there is a neighbor B’ with >n items— then transfer the first (or the last item) of B’ to B and

update the appropriate ancestors of B

260101

222101

10095 101

290260

54

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees Trees –– DeletionDeletion

2. Case: Transferring Items From Siblings

if the leaf page B where x appears has n itemsif there is a neighbor B’ with >n items — then … (see previous page)— else combine B with a neighbor B’, delete from parent of B

the first item of the rightmost of B and B’

101

10095 222101 290

Deletion often not fully implemented

55

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: B+ Structures: B+ --Trees ImplementationsTrees Implementations

Oracle:— Standard index form ( called B-tree index)— Syntax: CREATE [ UNIQUE ]INDEX <index_name>

ON <tablename> (<attribute_list>)[ TABLESPACE <tablespace> ][ PCTFREE integer ]

[...]

CREATE [UNIQUE] INDEX <index_name>ON <tablename> (attribute_list>)

MySQL:Standard index formSyntax:

56

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves

Leaf of B+ index: additional I/O to access row

rowId ........

Header

rowId rowId

Key value

......

Non key values....

Key value

Data leaf: row in leaf node instead of row-id

57

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves

Key value not repeated in row dataNo pointer (row-id) in leaf pages

Advantages:— Space reduction— Very good performance if key is long and row is short to

medium, otherwise frequent splits

Primary key indexes: Leaves containing row dataSecondary indexes: Use primary key as "pointer"

58

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index Trees with data leavesStructures: Index Trees with data leaves

Oracle:Called index organized tables (IOT)

CREATE TABLE <tablename><tabledefinition>ORGANIZATION INDEX TABLESPACE <tablespace_name>;

LOBs supported, not LONGLONG and LOB supported

Unique constraints not allowedUnique Constraints allowed

Primary key based accessRow-id based access

Logical row-idPhysical row-id

Primary key is identifierRow-id is identifier

IOTTable

Differences to regular table:

MySQL: not supported

59

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Hash indexStructures: Hash index

Apply hash function on search key h:K→BB: numbering of n buckets , |K| >> |B|Assign keys to buckets according h(K)

Advantage: Efficient access, if inserts infrequent

Record with key value Ki

1

2

0Buckets(blocks)

hash function hK1

K2K3

K4

60

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Hash indexStructures: Hash index

Disadvantages

No sequential scanStatic: No dynamic increase of spaceFull buckets → pointers to overflow buckets

1

2

0

K1

K2K3

K4

Point queries efficientRange queries inefficientNon unique index: retrieval has to scan rehash chain

⇒ Most DBS don't use simple hash

61

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index

Dynamic growth of hash tablePrinciple idea:— h(K) is n-bit Bit-string, n large, e.g. 32 bits— Not all bits used for addressing buckets, e.g., 232 Blocks— Use as many bits as necessary to address used blocks— Use Bits right or left ordered (here right-ordered)

***.. .. * 0

***.. .. * 1

1 Bit used

1Global depth Local depth

Directory

1

110

1

1111001

Important concept

Records

62

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index

Indirect addressing of blocks by hash valuesand directory (= pointer array PA)PA may grow, doubles each timeSame block may have more than one pointer

Row access(key K)— Take last i Bits of h(K), i= global depth— PA[bi...b1] is Pointer to block— Access block, search sequentially for row

One I/O if directory in main memory already

63

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index

Example— (Primary) Extensible Hash Index on Movie— h(id)=id mod 7, bucket size = 3

0 0

0 1

2

1 0

1 1

2...2.00Hitchcock1960suspensePsycho095

... 1.50 Spielberg1982comedyET112

...2.00Lucas 1999SciFiStar Wars I345

...1.50Spielberg 1975horrorJaws100

...2.20Van Sant1998suspensePsycho222

...2.00Lucas1997ScFiStar Wars IV290

2

1

64

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Extensible Hash indexStructures: Extensible Hash index

Hash key inserts:— Split buckets and increase local depth if necessary— Increase global depth if necessary

Hash key deletionsDelete empty buckets Shrink directory if all local depths smaller than global depth

Variant: ‘start buckets’ not deleted or mergedVariant: Merge 2 neighboring buckets if content fits in one, neighbors: same local depth, similar hash values

65

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Bitmap IndexStructures: Bitmap Index

Useful if few different values in a large table

......... ............

.....................

......... ............

... ... ...............

......... ............

.....................

.....................

......... ............

.....................

......... ............

... ... ...............

......... ............

.....................

.....................

File 3, block 10-12:

<Blue , 10.0.3, 12.8.3, 10001001000100><Green , 10.0.3, 12.8.3, 00010000100010> <Red , 10.0.3, 12.8.3, 01000100010000><Yellow, 10.0.3, 12.8.3, 00100010001001>

KeystartROWID

endROWID bitmap

Bitmap index:

66

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Bitmap IndexStructures: Bitmap Index

Efficient implementation of set operations

Example:

<Blue, 10.0.3, 12.8.3, 10001001000100 ><Red, 10.0.3, 12.8.3, 00010000100010 >

<female, 10.0.3, 12.8.3, 01010110100011 >

<RESULT 00010000100010 >

OR

) AND

SELECT x,y,z FROM people WHERE (color = 'Blue' OR color = 'Red' )AND sex = ‘f'

(

67

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Bitmap IndexStructures: Bitmap Index

Advantage— If few values and many rows e.g. sex, marital status,…— Compression of bit lists saves space— Efficient processing of OR / AND queries

Disadvantage— Updates expensive:

bitmaps and data blocks must be locked— B+ -tree : one row is locked during update

Oracle Syntax: CREATE BITMAP INDEX <indexname>

ON <tablename (<attribute>)>[TABLESPACE <tablesspacename>] [PCTFREE integer];

68

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: FunctionStructures: Function--based indexesbased indexes

Indexes functions and expressionsUseful if operation performed frequently

Example:— Indexes on from_date,until_date not sufficient

Oracle Syntax:

SELECT mem_noFROM rental WHERE DAYS_BETWEEN(from_date,until_date)>10;

CREATE INDEX <indexname> ON <tablename> (<function>);

69

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: ClusteringStructures: Clustering

Principle: put related data into a group (a cluster)Goal: efficient access to related ("clustered") data

Typical statistical techniqueNo statistics available during DB design

Important concept

Two approaches:Clustering of heterogeneous objects (table clustering)Index clustering

Be careful with different cluster notions

70

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Table ClusteringStructures: Table Clustering

Tables frequently accessed together (join)

Example: — Access to a Movie record is often followed by access to a

tape containing this movie:

— Tape- and movie records with the same movie_id should be placed in one block

SELECT t.id, t.format, m.id, m.titleFROM Tape t, Movie mWHERE m.id = t.movie_id;

Cluster: set of blocks with interleaving (related) rows of more than one table

71

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Table ClusteringStructures: Table Clustering

Example:— Cluster key: movie_id

...2.00Hitchcock1960suspensePsycho095

... 1.50 Spielberg1982comedyET112

...1.50Spielberg 1975horrorJaws100

...2.00Lucas 1999SciFiStar Wars I345

...2.20Van Sant1998suspensePsycho222

...2.00Lucas1997ScFiStar Wars IV290

DVD0001

DVD0002

VHS0003

VHS0009

VHS0005

DVD0004

72

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Table ClusteringStructures: Table Clustering

Clusters are defined by common cluster keyCluster key often primary / foreign keyNeed to chain the records, otherwise more block accesses required than unclustered

Table cluster definition:— First create a cluster

— Then create the tables in the cluster

CREATE CLUSTER mtc(id NUMBER);

CREATE TABLE Movie(…) CLUSTER mtc(id);CREATE TABLE Tape(…) CLUSTER mtc(movie_id);

73

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index ClusteringStructures: Index Clustering

Row-ids in a leaf page normally not orderedSequential index scan means random access to rows

Example: Heap Storage, index without cluster

...

Leaf nodes

Root

Datarecords

... ...

K..K.. … K..

K..K.. … K..K..K.. … K..K..K.. … K.. ...

......

74

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Index ClusteringStructures: Index Clustering

Clustering index:— Physical order on non-key attribute— Obvious: only one cluster per table— Faster access to rows with same value in cluster attribute

Example:

...... ...

K..K.. … K..

K..K.. … K..K..K.. … K..K..K.. … K.. ...

......

75

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

IndexIndex Structures: Clustering implementationStructures: Clustering implementation

Oracle:— Table cluster supported— Index cluster supported (only hash index)

CREATE INDEX <indexname>ON <clustername> [...];

Postgres:— Table cluster not supported — Index cluster supportedCLUSTER <indexname> ON <table>

MySQL: table and index cluster not supported

76

Dr

A. Hinze, U

niversity of Waikato, N

ew Zealand,2006, CO

MP 329 :D

atabase Administration

Physical Design + Index Structures: SummaryPhysical Design + Index Structures: Summary

Data stored on diskAccess time crucial in querying— I/Os is most important cost measure— Access Time: Seek time + Rotational time + Transfer time

Indexes accelerate access to secondary storage — B+ tree is standard in most DBs— Often extensible hashing supported— Clustering: related data in physical neighborhood

Great differences in physical organization in DBSIndexing not standardized