cs4432: database systems ii
DESCRIPTION
CS4432: Database Systems II. Lecture #8 (Basic indexing). Professor Elke A. Rundensteiner. Indexing (Chapter 14 ). Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000;. record. ?. value. Topics. - PowerPoint PPT PresentationTRANSCRIPT
CS 4432 lecture #9 1
CS4432: Database Systems IILecture #8
(Basic indexing)
Professor Elke A. Rundensteiner
CS 4432 lecture #9 2
Indexing : helps to retrieve data quicker for certain queries
value= 1,000,000
Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;
Indexing (Chapter 14 )
value
record
CS 4432 lecture #9 3
Topics
• Sequential Index Files • Secondary Indexes
CS 4432 lecture #9 4
Sequential File
2010
4030
6050
8070
10090
CS 4432 lecture #9 5
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
Every record
is in index.
CS 4432 lecture #9 6
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
Only first record
per block in index.
CS 4432 lecture #9 7
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
CS 4432 lecture #9 8
Note : DATA FILE or INDEX can be both “ordered files”.
Question:How would we lay them out on disk ?
- contiguous layout on disk ? - block-chained layout on disk ?
CS 4432 lecture #9 9
Questions:
• Do we want to build a dense 2nd-level index for a dense index?
• Can we even do this ?
Sequential File2010
4030
6050
8070
10090
2nd level?1030507090
110130150170190210230
1090
170250330410490570
1st level?
CS 4432 lecture #9 10
Notes on pointers:
(1)Block pointer (used in sparse index) can be smaller than record pointer (used in dense index)
BP
RP
CS 4432 lecture #9 11
K1
K3
K4
K2
R1
R2
R3
R4
say:1024 Bper block
• if we want K3 block:• get it at offset (3-1)*1024 = 2048 bytes
Note : If file is contiguous, then we can omit pointers
CS 4432 lecture #9 12
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record can keep more of index in
memory (Later: sparse better for insertions)
• Dense: Can tell if any record exists without accessing file
(Later: dense needed for secondary indexes)
CS 4432 lecture #9 13
Terms
• Index sequential file• Search key ( primary key)• Primary index (on sequencing field)• Secondary index• Dense index (contains all search
key values)• Sparse index• Multi-level index
CS 4432 lecture #9 14
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 4432 lecture #9 15
Duplicate keys
1010
2010
3020
3030
4540
CS 4432 lecture #9 16
1010
2010
3020
3030
4540
1010
2010
3020
3030
4540
10101020
20303030
10101020
20303030
Dense index ! Point to each value !
Duplicate keys
CS 4432 lecture #9 17
1010
2010
3020
3030
4540
Dense index. Point to each distinct value!
10203040
Duplicate keys
CS 4432 lecture #9 18
1010
2010
3020
3030
4540
10102030
Sparse index: point to start of block !
Duplicate keys
care
ful if lookin
gfo
r 2
0 o
r 3
0!
CS 4432 lecture #9 19
1010
2010
3020
3030
4540
10203030
Sparse index, another way ?
Duplicate keys
– place first new key from block
shouldthis be40?
CS 4432 lecture #9 20
Duplicate values, primary index
• Index may point to first instance ofeach value only
File Index
Summary
aaa
b
CS 4432 lecture #9 21
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 4432 lecture #9 22
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
CS 4432 lecture #9 23
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
CS 4432 lecture #9 24
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
CS 4432 lecture #9 25
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
CS 4432 lecture #9 26
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
CS 4432 lecture #9 27
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
CS 4432 lecture #9 28
Insertion, sparse index case
2010
30
5040
60
10304060
CS 4432 lecture #9 29
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
CS 4432 lecture #9 30
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
• Immediate reorganization• Other variations?
CS 4432 lecture #9 31
• Just Illustrated: -Immediate reorganization
• Now Variation:– insert new block (chained file)– otherwise leave data file– update index only
CS 4432 lecture #9 32
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
25
overflow blocks(reorganize later...)
CS 4432 lecture #9 33
Insertion, dense index case
• Similar
• Often more expensive . . .
CS 4432 lecture #9 34
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 4432 lecture #9 35
Secondary indexesSequencefield
5030
7020
4080
10100
6090
Can I make a
secondary
index sparse ?
CS 4432 lecture #9 36
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
CS 4432 lecture #9 37
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
?
CS 4432 lecture #9 38
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
CS 4432 lecture #9 39
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Must be dense index !10203040
506070...
105090...
sparsehighlevel
allowed?
CS 4432 lecture #9 40
Reminder : With secondary indexes:• Lowest level is dense• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; nor off-sets)
CS 4432 lecture #9 41
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
CS 4432 lecture #9 42
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
Problem:excess overhead!
• disk space• search time
CS 4432 lecture #9 43
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20Problem:variable sizerecords inindex!
CS 4432 lecture #9 44
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea :Chain records with same key !
Problems:• Need to add fields to data records for each index• Need to follow chain to know records
CS 4432 lecture #9 45
Summary : Indexing Basics
– Basic Ideas: sparse, dense, multi-level…
– Duplicate Keys– Deletion/Insertion– Secondary Indexes