cs4432: database systems ii

Post on 30-Dec-2015

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CS4432: Database Systems II. Lecture #8 (Basic indexing). Professor Elke A. Rundensteiner. Indexing (Chapter 14 ). Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000;. record. ?. value. Topics. - PowerPoint PPT Presentation

TRANSCRIPT

CS 4432 lecture #9 1

CS4432: Database Systems IILecture #8

(Basic indexing)

Professor Elke A. Rundensteiner

CS 4432 lecture #9 2

Indexing : helps to retrieve data quicker for certain queries

value= 1,000,000

Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;

Indexing (Chapter 14 )

value

record

CS 4432 lecture #9 3

Topics

• Sequential Index Files • Secondary Indexes

CS 4432 lecture #9 4

Sequential File

2010

4030

6050

8070

10090

CS 4432 lecture #9 5

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Every record

is in index.

CS 4432 lecture #9 6

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Only first record

per block in index.

CS 4432 lecture #9 7

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

CS 4432 lecture #9 8

Note : DATA FILE or INDEX can be both “ordered files”.

Question:How would we lay them out on disk ?

- contiguous layout on disk ? - block-chained layout on disk ?

CS 4432 lecture #9 9

Questions:

• Do we want to build a dense 2nd-level index for a dense index?

• Can we even do this ?

Sequential File2010

4030

6050

8070

10090

2nd level?1030507090

110130150170190210230

1090

170250330410490570

1st level?

CS 4432 lecture #9 10

Notes on pointers:

(1)Block pointer (used in sparse index) can be smaller than record pointer (used in dense index)

BP

RP

CS 4432 lecture #9 11

K1

K3

K4

K2

R1

R2

R3

R4

say:1024 Bper block

• if we want K3 block:• get it at offset (3-1)*1024 = 2048 bytes

Note : If file is contiguous, then we can omit pointers

CS 4432 lecture #9 12

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of index in

memory (Later: sparse better for insertions)

• Dense: Can tell if any record exists without accessing file

(Later: dense needed for secondary indexes)

CS 4432 lecture #9 13

Terms

• Index sequential file• Search key ( primary key)• Primary index (on sequencing field)• Secondary index• Dense index (contains all search

key values)• Sparse index• Multi-level index

CS 4432 lecture #9 14

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #9 15

Duplicate keys

1010

2010

3020

3030

4540

CS 4432 lecture #9 16

1010

2010

3020

3030

4540

1010

2010

3020

3030

4540

10101020

20303030

10101020

20303030

Dense index ! Point to each value !

Duplicate keys

CS 4432 lecture #9 17

1010

2010

3020

3030

4540

Dense index. Point to each distinct value!

10203040

Duplicate keys

CS 4432 lecture #9 18

1010

2010

3020

3030

4540

10102030

Sparse index: point to start of block !

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

CS 4432 lecture #9 19

1010

2010

3020

3030

4540

10203030

Sparse index, another way ?

Duplicate keys

– place first new key from block

shouldthis be40?

CS 4432 lecture #9 20

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

CS 4432 lecture #9 21

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #9 22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

CS 4432 lecture #9 23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

CS 4432 lecture #9 24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

CS 4432 lecture #9 25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

CS 4432 lecture #9 26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

CS 4432 lecture #9 27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

CS 4432 lecture #9 28

Insertion, sparse index case

2010

30

5040

60

10304060

CS 4432 lecture #9 29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

CS 4432 lecture #9 30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Immediate reorganization• Other variations?

CS 4432 lecture #9 31

• Just Illustrated: -Immediate reorganization

• Now Variation:– insert new block (chained file)– otherwise leave data file– update index only

CS 4432 lecture #9 32

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

CS 4432 lecture #9 33

Insertion, dense index case

• Similar

• Often more expensive . . .

CS 4432 lecture #9 34

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

CS 4432 lecture #9 35

Secondary indexesSequencefield

5030

7020

4080

10100

6090

Can I make a

secondary

index sparse ?

CS 4432 lecture #9 36

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

CS 4432 lecture #9 37

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

?

CS 4432 lecture #9 38

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

CS 4432 lecture #9 39

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Must be dense index !10203040

506070...

105090...

sparsehighlevel

allowed?

CS 4432 lecture #9 40

Reminder : With secondary indexes:• Lowest level is dense• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; nor off-sets)

CS 4432 lecture #9 41

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

CS 4432 lecture #9 42

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

CS 4432 lecture #9 43

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

CS 4432 lecture #9 44

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

Another idea :Chain records with same key !

Problems:• Need to add fields to data records for each index• Need to follow chain to know records

CS 4432 lecture #9 45

Summary : Indexing Basics

– Basic Ideas: sparse, dense, multi-level…

– Duplicate Keys– Deletion/Insertion– Secondary Indexes

top related