cs 44321 cs4432: database systems ii. cs 44322 index definition in sql create index name on rel...

Post on 22-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS 4432 1

CS4432: Database Systems II

CS 4432 2

Index definition in SQL

• Create index name on rel (attr)

(Check online for index definitions in SQL)

• Drop INDEX name

CS 4432 3

ATTRIBUTE LIST MULTIKEY INDEX

e.g., CREATE INDEX foo ON R(A,B,C)

Note

CS 4432 4

Motivation: Find records where DEPT = “Toy” AND SAL >

50k

Multi-key Index

CS 4432 5

Strategy I:

• Use one index, say Dept.• Get all Dept = “Toy” records

and check their salary

I1

CS 4432 6

• Use 2 Indexes; Manipulate Pointers

Toy Sal>

50k

Strategy II:

CS 4432 7

• Multiple Key Index

One idea:

Strategy III:

I1

I2

I3

CS 4432 8

Example

ExampleRecord

DeptIndex

SalaryIndex

Name=JoeDEPT=SalesSAL=15k

ArtSalesToy

10k15k17k21k

12k15k15k19k

CS 4432 9

For which queries is this index good?

Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20k

CS 4432 10

Many alternate methods for indexing

CS 4432 11

key h(key)

Hashing

<key>

.

.

Buckets(typically 1disk block)

CS 4432 12

One example hash function

• Key = ‘x1 x2 … xn’ n-byte character string

• Have b buckets

• Hash function :– h: add (x1 + x2 + ….. Xn) modulo b

CS 4432 13

This may not be best function … Read Knuth Vol. 3 if you really

need to select a good function.

Good hash Expected number of

function: keys/bucket is thesame for all

buckets

CS 4432 14

Within a bucket:

• Do we keep keys sorted?

• Yes, if CPU time critical & Inserts/Deletes not too frequent

CS 4432 15

Next: example to illustrateinserts, overflows,

deletes

h(K)

CS 4432 16

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e

CS 4432 17

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

maybe move“g” up

cd

CS 4432 18

Rule of thumb:• Try to keep space utilization

between 50% and 80% Utilization = # keys used

total # keys that fit

• If < 50%, wasting space• If > 80%, overflows significant

depends on how good hash function is & on # keys/bucket

CS 4432 19

How do we cope with growth?

• Overflows and reorganizations• Dynamic hashing

• Extensible hashing• Others …

CS 4432 20

Extensible hashing : idea 1

(a) Use i of b bits output by hash function

b h(K)

use i grows over time….

Note: enables future doubling of space !

00110101

CS 4432 21

(b) Hash to directory of pointers to buckets (instead of buckets directly)

h(K)[i ] to bucket

Note : Double space by doubling the directory !

.

.

.

.

Extensible hashing : idea 2

CS 4432 22

Example: h(k) is 4 bits; 2 keys/bucket

i = 1

1

1

0001

1001

1100

Insert 1010

11100

1010

New directory

200

01

10

11

i =

2

2

01

CS 4432 23

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

CS 4432 24

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

CS 4432 25

Extensible hashing: deletion

• Merge blocks and cut directory if possible

(Reverse insert procedure)

CS 4432 26

Extensible hashing

If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files

- with less wasted space- with no full reorganizations

Summary

+

Indirection(Not bad if directory in

memory)

Directory doubles in size(Now it fits, now it does not)

-

-

+

CS 4432 27

Use what when :

• Indexing : Tree-Structures vs Hashing

CS 4432 28

• Hashing good for probes given keye.g., SELECT …

FROM RWHERE R.A = 5

Indexing vs Hashing

CS 4432 29

• INDEXING (Including B Trees) good for

Range Searches:e.g., SELECT

FROM RWHERE R.A > 5

Indexing vs Hashing

CS 4432 30

Reading Chapter 14

• Read – 14.3.1 and 14.3.2

CS 4432 31

The BIG picture….

• Chapters 11 & 12: Storage, records, blocks...

• Chapter 13 & 14: Access Mechanisms - Indexes

- B trees - Hashing - Multi key

• Chapter 15 & 16: Query ProcessingNEXT

top related